HBase基本操作03

接上一篇,HBase与传统数据库一个很大的不同之处:HBase可以保存多个版本的值,而不仅仅是保存最新的值。版本之间,通过timestamp属性来区分。

我们来具体看一下,首先新增一行visit100,我们多次设置列personinfo:name,每次设置后都查询一次最新的值:

hbase(main):011:0> put 'patientvisit','visit100','personinfo:name','A'
0 row(s) in 0.0890 seconds

hbase(main):012:0> get 'patientvisit','visit100','personinfo:name'
COLUMN                                        CELL                                                                                                                               
 personinfo:name                              timestamp=1454342758182, value=A                                                                                                   
1 row(s) in 0.0310 seconds

hbase(main):013:0> put 'patientvisit','visit100','personinfo:name','B'
0 row(s) in 0.0100 seconds

hbase(main):014:0> get 'patientvisit','visit100','personinfo:name'
COLUMN                                        CELL                                                                                                                               
 personinfo:name                              timestamp=1454342796166, value=B                                                                                                   
1 row(s) in 0.0230 seconds

hbase(main):015:0> put 'patientvisit','visit100','personinfo:name','B'
0 row(s) in 0.0140 seconds

hbase(main):016:0> get 'patientvisit','visit100','personinfo:name'
COLUMN                                        CELL                                                                                                                               
 personinfo:name                              timestamp=1454342802829, value=B                                                                                                   
1 row(s) in 0.0120 seconds

我们可以看到,每次设置该列的数值timestamp都有相应的变化:

timestamp value
1454342758182 A
1454342796166 B
1454342802829 B

Continue reading HBase基本操作03

HBase基本操作02

接上一篇,patientvisit表中的数据,最终是什么样子的呢?
为了便于查看,我把数据拆分成了两个Table:

Row Key personinfo personinfoex
empi name sex birthday address patid
visit001 empi001 zhangsan male 1999-12-31 shanghai xxx road pat001
visit002 empi002 lisi male 2000-01-01 beijing pat002
visit003 empi003 wangwu female 1999-12-30 guangzhou pat002
Row Key visitinfo visitinfoex
visitid visittime visitdocid visitdocname
visit001 visit001 2015-07-25 10:10:00 doc001 Dr. Yang
visit002 visit002 2015-07-26 11:11:00 doc001 Dr. Yang
visit003 visit003 2015-07-27 13:13:00 doc002 Dr. Li

Continue reading HBase基本操作02

HBase基本操作01

首先说一下HBase与传统的关系型数据库在逻辑层次上的不同:
1、HBase的表结构定义中是不需要定义列的,只需要定义列族(可以暂时把列族当成多个列的集合)。所以在建表的时候,只需要指定列族即可。在列族中新增列,是不需要任何事先声明的,直接使用就好了。
2、HBase中,行是通过key来定位的,扫描更是通过key来进行的。所以行的key值选择,就显得十分重要。表结构的定义及key值的选择,实际上决定了数据是否可以高效利用。
3、HBase中,行Key+列族名+列名,可以定义到唯一的一个Cell

其实,从这里大家可以看出:
1、HBase通过对行进行了一定的限制,实现了列的灵活操作,解决了列扩展的问题
2、我们实际应用中,往往将主表及多个关联表不计重复的一起记录到列中,通过对空间的浪费,来实现了时间的节省,解决了查询效率的问题。
换句话说,数据的增加速度,超出了硬件进步的水平,硬件处理速度已经无法满足如此大量数据的处理,只能通过分布式技术,通过并发处理,将任务分配到多个节点,才能满足速度的需要。
3、分布式处理也要付出代价,节点间的通信,再小也会是个技术瓶颈。从这个角度来说,如果数据量没有这么大的话,采用分布式处理,反而不如单节点优化的效果好。
4、分布式处理的优点是,可以通过一堆性能一般的电脑,达到一台高性能计算机的处理速度,同时自带了数据冗余机制,降低了维护量。但其维护量,总体上还是上升了。所以是否采用,就要权衡数据量及维护量之间的关系了。

哦,扯远了。。。咱们继续。

第一步当然是看一下帮助:

hadoop@hadoop-master:~/Deploy/hbase-1.1.2$ bin/hbase shell

hbase(main):061:0> help
HBase Shell, version 1.1.2, rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26 20:11:27 PDT 2015
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:
  Group name: general
  Commands: status, table_help, version, whoami

  Group name: ddl
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools
  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, split, trace, unassign, wal_roll, zk_dump

  Group name: replication
  Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs

  Group name: snapshots
  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot

  Group name: configuration
  Commands: update_all_config, update_config

  Group name: quotas
  Commands: list_quotas, set_quota

  Group name: security
  Commands: grant, revoke, user_permission

  Group name: visibility labels
  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

SHELL USAGE:
Quote all names in HBase Shell such as table and column names.  Commas delimit
command parameters.  Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:

  {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces.  Key/values are delimited by the
'=>' character combination.  Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:

  hbase> get 't1', "key\x03\x3f\xcd"
  hbase> get 't1', "key\003\023\011"
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
hbase(main):062:0> 

Continue reading HBase基本操作01

搭建HBase集群环境

1、正常运行Hadoop集群

2、检查HBase与Hadoop的兼容性,下载对应的正确版本(*如果要看后续文章,建议使用hadoop-2.5.2 hbase-1.1.2 hive-1.2.1 spark-2.0.0)
S:支持
X:不支持
NT:未测试

HBase-0.94.x HBase-0.98.x (Support for Hadoop 1.1+ is deprecated.) HBase-1.0.x (Hadoop 1.x is NOT supported) HBase-1.1.x HBase-1.2.x
Hadoop-1.0.x X X X X X
Hadoop-1.1.x S NT X X X
Hadoop-0.23.x S X X X X
Hadoop-2.0.x-alpha NT X X X X
Hadoop-2.1.0-beta NT X X X X
Hadoop-2.2.0 NT S NT NT NT
Hadoop-2.3.x NT S NT NT NT
Hadoop-2.4.x NT S S S S
Hadoop-2.5.x NT S S S S
Hadoop-2.6.0 X X X X X
Hadoop-2.6.1+ NT NT NT NT S
Hadoop-2.7.0 X X X X X
Hadoop-2.7.1+ NT NT NT NT S

Continue reading 搭建HBase集群环境

搭建Cassandra集群环境

1、环境准备

VirtualBox4
Debian8
JDK8u60
Cassandra3

2、安装虚拟机,安装Guset插件

su
apt-get install gcc
apt-get install linux-headers-$(uname -r)
apt-get install build-essential
./VBoxLinuxAdditions.run

设置共享目录,将需要的文件拷贝到虚拟机。
当然也可以设置好虚拟的的ssh后,用scp或winscp将文件拷贝到虚拟机。

3.网络配置为两块网卡,第一块为Hostonly设为固定IP,第二块为NAT,设置为dhcp
修改配置文件/etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
address 172.16.172.23
netmask 255.255.0.0
gateway 172.16.172.2

auto eth1
iface eth1 inet dhcp

修改hosts文件

#/etc/hosts
127.0.0.1	localhost
172.16.172.23	node01
172.16.172.24	node02
172.16.172.25	node03

修改hostname

#/etc/hostname
node01

根据需要(一般用不到),修改配置文件/etc/resolv.conf

nameserver xxx.xxx.xxx.xxx

重启网卡

su
ifconfig eth0 down
ifconfig eth0 up
ifconfig eth1 down
ifconfig eth1 up

Continue reading 搭建Cassandra集群环境

Hadoop增删改查(Java)

需要的jar包在hadoop里都可以找到,下面的例子中,至少需要这些jar包:

commons-cli-1.2.jar
commons-collections-3.2.1.jar
commons-configuration-1.6.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
guava-11.0.2.jar
hadoop-auth-2.7.1.jar
hadoop-common-2.7.1.jar
hadoop-hdfs-2.7.1.jar
htrace-core-3.1.0-incubating.jar
log4j-1.2.17.jar
protobuf-java-2.5.0.jar
servlet-api.jar
slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar

代码如下:

package com.neohope.hadoop.test;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class HDFSTest {

	static Configuration hdfsConfig;
	static {
		hdfsConfig = new Configuration();
		hdfsConfig.addResource(new Path("etc/hadoop/core-site.xml"));
		hdfsConfig.addResource(new Path("etc/hadoop/hdfs-site.xml"));
	}

	// 创建文件夹
	public static void createDirectory(String dirPath) throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path p = new Path(dirPath);
		try {
			fs.mkdirs(p);
		} finally {
			fs.close();
		}
	}

	// 删除文件夹
	public static void deleteDirectory(String dirPath) throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path p = new Path(dirPath);
		try {
			fs.deleteOnExit(p);
		} finally {
			fs.close();
		}
	}

	// 重命名文件夹
	public static void renameDirectory(String oldDirPath, String newDirPath)
			throws IOException {
		renameFile(oldDirPath, newDirPath);
	}

	// 枚举文件
	public static void listFiles(String dirPath) throws IOException {
		FileSystem hdfs = FileSystem.get(hdfsConfig);
		Path listf = new Path(dirPath);
		try {
			FileStatus statuslist[] = hdfs.listStatus(listf);
			for (FileStatus status : statuslist) {
				System.out.println(status.getPath().toString());
			}
		} finally {
			hdfs.close();
		}
	}

	// 新建文件
	public static void createFile(String filePath) throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path p = new Path(filePath);
		try {
			fs.createNewFile(p);
		} finally {
			fs.close();
		}
	}

	// 删除文件
	public static void deleteFile(String filePath) throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path p = new Path(filePath);
		try {
			fs.deleteOnExit(p);
		} finally {
			fs.close();
		}
	}

	// 重命名文件
	public static void renameFile(String oldFilePath, String newFilePath)
			throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path oldPath = new Path(oldFilePath);
		Path newPath = new Path(newFilePath);
		try {
			fs.rename(oldPath, newPath);
		} finally {
			fs.close();
		}
	}

	// 上传文件
	public static void putFile(String locaPath, String hdfsPath)
			throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path src = new Path(locaPath);
		Path dst = new Path(hdfsPath);
		try {
			fs.copyFromLocalFile(src, dst);
		} finally {
			fs.close();
		}
	}

	// 取回文件
	public static void getFile(String hdfsPath, String locaPath)
			throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path src = new Path(hdfsPath);
		Path dst = new Path(locaPath);
		try {
			fs.copyToLocalFile(false, src, dst, true);
		} finally {
			fs.close();
		}
	}

	// 读取文件
	public static void readFile(String hdfsPath) throws IOException {
		FileSystem hdfs = FileSystem.get(hdfsConfig);
		Path filePath = new Path(hdfsPath);

		InputStream in = null;
		BufferedReader buff = null;
		try {
			in = hdfs.open(filePath);
			buff = new BufferedReader(new InputStreamReader(in));
			String str = null;
			while ((str = buff.readLine()) != null) {
				System.out.println(str);
			}
		} finally {
			buff.close();
			in.close();
			hdfs.close();
		}
	}

	public static void main(String[] args) throws IOException {
		System.setProperty("HADOOP_USER_NAME", "hadoop");
		// createDirectory("hdfs://hadoop-master:9000/usr");
		// createDirectory("hdfs://hadoop-master:9000/usr/hansen");
		// createDirectory("hdfs://hadoop-master:9000/usr/hansen/test");
		// renameDirectory("hdfs://hadoop-master:9000/usr/hansen/test","hdfs://hadoop-master:9000/usr/hansen/test01");
		// createFile("hdfs://hadoop-master:9000/usr/hansen/test01/hello.txt");
		// renameFile("hdfs://hadoop-master:9000/usr/hansen/test01/hello.txt","hdfs://hadoop-master:9000/usr/hansen/test01/hello01.txt");
		// putFile("hello.txt","hdfs://hadoop-master:9000/usr/hansen/test01/hello02.txt");
		// getFile("hdfs://hadoop-master:9000/usr/hansen/test01/hello02.txt","hello02.txt");
		// readFile("hdfs://hadoop-master:9000/usr/hansen/test01/hello02.txt");
		listFiles("hdfs://hadoop-master:9000/usr/hansen/test01/");
	}

}

Hadoop Linux Native 编译说明

首先说明一下,如果要使用Linux Native的话,Hadoop是已经自带了哦

然后,如果要编译的话,建议直接从Hadoop源码按官方的说明进行编译,不要像我这样自己搞。。。

如果你喜欢折腾,请继续看:

1、按源码架构拷贝下面的文件及文件夹

hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\main\native
hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\CMakeLists.txt
hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\config.h.cmake
hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\JNIFlags.cmake
hadoop-2.5.2-src\hadoop-hdfs-project\hadoop-hdfs\src\main\native
hadoop-2.5.2-src\hadoop-hdfs-project\hadoop-hdfs\src\CMakeLists.txt(可能需要调整一下依赖文件JNIFlags.cmake的相对路径)
hadoop-2.5.2-src\hadoop-hdfs-project\hadoop-hdfs\src\config.h.cmake

2、编译libhadoop
2.1、检查并安装以来关系

#需要gcc、make、jdk,这些一般大家都有了
#需要zlib
apt-get install zlib1g-dev
#需要cmake
apt-get install cmake

2.2、用cmake生成Makefile

cmake ./src/ -DGENERATED_JAVAH=~/Build/hadoop-2.5.2-src/build/hadoop-common-project/hadoop-common/native/javah -DJVM_ARCH_DATA_MODEL=64 -DREQUIRE_BZIP2=false -DREQUIRE_SNAPPY=false

2.3、用javah生成头文件
需要三个jar包,hadoop-common,hadoop-annotations,guava

javah org.apache.hadoop.io.compress.lz4.Lz4Compressor
javah org.apache.hadoop.io.compress.lz4.Lz4Decompressor
javah org.apache.hadoop.io.compress.zlib.ZlibCompressor
javah org.apache.hadoop.io.compress.zlib.ZlibDecompressor
javah org.apache.hadoop.io.nativeio.NativeIO 
javah org.apache.hadoop.io.nativeio.SharedFileDescriptorFactory
javah org.apache.hadoop.net.unix.DomainSocket
javah org.apache.hadoop.net.unix.DomainSocketWatcher
javah org.apache.hadoop.security.JniBasedUnixGroupsMapping
javah org.apache.hadoop.security.JniBasedUnixNetgroupsMapping
javah org.apache.hadoop.util.NativeCrc32

将生成的文件,拷贝到对应的c文件夹中

2.3、生成

make

3、编译libhdfs
3.1、用cmake生成Makefile

cmake ./src/ -DGENERATED_JAVAH=~/Build/hadoop-2.5.2-src/build/hadoop-common-project/hadoop-common/native/javah -DJVM_ARCH_DATA_MODEL=64 -DREQUIRE_LIBWEBHDFS=false -DREQUIRE_FUSE=false

3.2、生成

make

4、将生成的文件拷贝到HADOOP_HOME/lib/mynative

5、修改/etc/profile,增加下面一行

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/mynative"

6、刷新配置

source /etc/profile

搞定!

Hadoop Windows Native 编译说明

1、首先,下载hadoop-2.5.2-src源码

拷贝文件夹hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\main\native
拷贝文件夹from hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\main\winutils

2、设置JAVA_HOME及PATH环境变量

3、生成javah的头文件
解压hadoop-common-2.5.1.jar,然后运行

javah org.apache.hadoop.util.NativeCrc32
javah org.apache.hadoop.io.compress.lz4.Lz4Compressor
javah org.apache.hadoop.io.compress.lz4.Lz4Decompressor
javah org.apache.hadoop.io.nativeio.NativeIO
javah org.apache.hadoop.security.JniBasedUnixGroupsMapping
javah org.apache.hadoop.security.JniBasedUnixGroupsMapping

4、打开winutils.sln,修改输出路径到../bin,编译

5、打开native.sln,修改输出路径到../bin,修改winutils.lib引用地址,编译

6、拷贝exe及dll文件到HADOOP_HOME/bin,搞定

常见问题:
1、编译的硬件平台要与java位数一致(x86,x64),否则dll无法加载
2、出问题时,先运行winutils.exe,无法运行时,要安装对应VS版本的vcredist可再发行包就好了
3、如果提示”unable to load native hadoop-library for your platform”的话,那只需要在JVM启动参数中制定native library的路径,就可以了

如果比较着急的话,可以到我的github上下载2.5.2版本的native binary:hadoop-windows-native

Hadoop环境搭建(下)

1、新建文件夹

bin/hadoop fs -ls /
bin/hadoop fs -mkdir /usr
bin/hadoop fs -mkdir /usr/neohope
bin/hadoop fs -mkdir /usr/neohope/test

2、从本地拷贝文件到hdfs

mkdir ~/test
echo hello hadoop >> ~/test/hello.txt
bin/hadoop fs -put ~/test/hello.txt /usr/neohope/test/

3、查看远程文件

bin/hadoop fs -ls /usr/neohope/test
bin/hadoop fs -cat /usr/neohope/test/hello.txt

4、从hdfs拷贝文件到本地

bin/hadoop fs -get /usr/neohope/test/hello.txt ~/test/hello1.txt
cat ~/test/hello1.txt

5、语法说明

hadoop@hadoop-master:~/hadoop-2.7.1$ bin/hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] <path> ...]
	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-d] [-h] [-R] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Hadoop环境搭建(中)

1、将hadoop解压

su hadoop
cd ~
tar -zxvf /home/neohope/Desktop/hadoop-2.7.1.tar.gz

2、修改/home/hadoop/hadoop-2.7.1/etc/hadoop/路径下配置
2.1、core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop-master:9000</value>
  </property>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://hadoop-master:9000</value>
  </property>

  <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/home/hadoop/hadoop-2.7.1/tmp</value>
  </property>

  <property>
    <name>io.file.buffer.size</name>
    <value>131702</value>
  </property>

</configuration>

2.2、hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/hadoop/hadoop-2.7.1/hdfs/name</value>
  </property>


  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/hadoop/hadoop-2.7.1/hdfs/data</value>
  </property>


  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>


  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop-master:9001</value>
  </property>


  <property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
  </property>

</configuration>

2.3、mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>

  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop-master:10020</value>
  </property>

  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop-master:19888</value>
  </property>

  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>2048</value>      
  </property>

  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>2048</value>      
  </property>

</configuration>

2.4、yarn-site.xml

<?xml version="1.0"?>

<!-- Site specific YARN configuration properties -->

<configuration>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <property>
    <name>yarn.resourcemanager.address</name>
    <value>hadoop-master:8032</value>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>hadoop-master:8030</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-ticker.address</name>
    <value>hadoop-master:8031</value>
  </property>

  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>hadoop-master:8033</value>
  </property>

  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>hadoop-master:8088</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource.memory-mb</name>
    <value>2048</value>
  </property>

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
  </property>

</configuration>

2.5、slaves

#localhost
hadoop-slave01
hadoop-slave02

3、修改/home/hadoop/hadoop-2.7.1/etc/hadoop/路径下JAVA路径
3.1、hadoop-env.sh

# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/java/jdk1.7.0_79

3.2、yarn-env.sh

# some Java parameters
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
if [ "$JAVA_HOME" != "" ]; then
  #echo "run java in $JAVA_HOME"
  #JAVA_HOME=$JAVA_HOME
  JAVA_HOME=/usr/java/jdk1.7.0_79
fi

4、分发hadoop文件夹到各slave

scp -r /home/hadoop/hadoop-2.7.1 hadoop@hadoop-slave01:~/
scp -r /home/hadoop/hadoop-2.7.1 hadoop@hadoop-slave02:~/

5、初始化主服务器

cd ~/hadoop-2.7.1
bin/hdfs namenode -format

6、启动hadoop

cd ~/hadoop-2.7.1
sbin/start-dfs.sh
sbin/start-yarn.sh

7、查看hadoop进程信息

/usr/java/jdk1.7.0_79/bin/jps

8、查看cluster信息

http://10.10.10.3:8088

9、查看hdfs文件系统信息

http://10.10.10.3:50070

10、Hadoop常用端口如下

端口 作用
9000 fs.defaultFS
9001 dfs.namenode.rpc-address口
50070 dfs.namenode.http-address
50470 dfs.namenode.https-address
50100 dfs.namenode.backup.address
50105 dfs.namenode.backup.http-address
50090 dfs.namenode.secondary.http-address
50091 dfs.namenode.secondary.https-address
50020 dfs.datanode.ipc.address
50075 dfs.datanode.http.address
50475 dfs.datanode.https.address
50010 dfs.datanode.address
8480 dfs.journalnode.rpc-address
8481 dfs.journalnode.https-address
8032 yarn.resourcemanager.address
8088 yarn.resourcemanager.webapp.address
8090 yarn.resourcemanager.webapp.https.address
8030 yarn.resourcemanager.scheduler.address
8031 yarn.resourcemanager.resource-tracker.address
8033 yarn.resourcemanager.admin.address
8042 yarn.nodemanager.webapp.address
8040 yarn.nodemanager.localizer.address
8188 yarn.timeline-service.webapp.address
10020 mapreduce.jobhistory.address
19888 mapreduce.jobhistory.webapp.address
2888 ZooKeeper,Leader用来监听Follower的连接
3888 ZooKeeper,用于Leader选举
2181 ZooKeeper,用来监听客户端的连接
60010 hbase.master.info.port
60000 hbase.master.port
60030 hbase.regionserver.info.port
60020 hbase.regionserver.port
8080 hbase.rest.port
10000 hive.server2.thrift.port
9083 hive.metastore.uris