Hive环境搭建02

1、启动Hadoop

2、启动metastore

./hive --service metastore &

3、启动hive

./hive

4、建表

show tables
create table inpatient(
PATIENT_NO String COMMENT '住院号',
NAME String COMMENT '姓名',
SEX_CODE String COMMENT '性别',
BIRTHDATE TIMESTAMP COMMENT '生日',
BALANCE_COST String COMMENT '总费用'
)
COMMENT '住院患者基本信息'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\'
STORED AS TEXTFILE

5、查看表

show tables;
describe inpatient;

6、加载数据

#从hdfs加载
./hadoop fs -put ~/Deploy/testdata/inpatient.txt /usr/hadoop/
load data inpath '/usr/hadoop/inpatient.txt' into table inpatient
#从本地加载
load data local inpath '/home/hadoop/Deploy/testdata/inpatient.txt' into table inpatient

7、那先来做一个查询,看下男女各多少人

hive> select count(*),sex_code from inpatient group by sex_code;
Query ID = hadoop_20161221104618_d530ab5e-d1c9-48c9-a5da-dff47f81786d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1482286286190_0005, Tracking URL = http://hadoop-master:8088/proxy/application_1482286286190_0005/
Kill Command = /home/hadoop/Deploy/hadoop-2.5.2/bin/hadoop job  -kill job_1482286286190_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-12-21 10:46:39,640 Stage-1 map = 0%,  reduce = 0%
2016-12-21 10:47:08,810 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.39 sec
2016-12-21 10:47:33,013 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.5 sec
MapReduce Total cumulative CPU time: 5 seconds 500 msec
Ended Job = job_1482286286190_0005
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.5 sec   HDFS Read: 43317170 HDFS Write: 26 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 500 msec
OK
4       "0"
60479   "F"
54928   "M"
Time taken: 77.198 seconds, Fetched: 3 row(s)

8.删除表

drop table inpatient;

9.执行dfs命令

dfs -ls /

Hive环境搭建01

1、安装MySQL

2、建立Hive的Meta库

CREATE DATABASE hive CHARACTER SET latin1;
CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';
GRANT ALL PRIVILEGES ON hive.* to 'hive'@'localhost';

如果上面的库字符集不小心弄成了utf8,就和我一样,就会遇到下面的错误:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes.

修正方法是:

alter database hive character set latin1;

3、下载hive-1.2.1,并解压
(*如果要看后续文章,建议使用hadoop-2.5.2 hbase-1.1.2 hive-1.2.1 spark-2.0.0)

4、下载MySQL的JDBC驱动mysql-connector-java-xxx.jar,放到hive的lib目录下

5、用hive中lib目录下jline-2.12.jar文件,替换掉hadoop下面share/hadoop/yarn/lib/jline-xxx.jar

6、复制hive-default.xml.template为hive-site.xml

7、修改hive-site.xml配置,分为下面几部分
7.1、mysql相关配置

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseInfoNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <!--value>org.apache.derby.jdbc.EmbeddedDriver</value-->
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>Username to use against metastore database</description>
  </property

7.2、hadoop相关配置(HDFS上要有这些文件)

  <property>
    <name>hive.exec.scratchdir</name>
    <value>hdfs://hadoop-master:9000/hive/tmp</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://hadoop-master:9000/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  <property>
    <name>hive.user.install.directory</name>
    <value>hdfs://hadoop-master:9000/hive/usr</value>
    <description>
      If hive (in tez mode only) cannot find a usable hive jar in "hive.jar.directory", 
      it will upload the hive jar to "hive.user.install.directory/user.name"
      and use it to run queries.
    </description>
  </property>

7.3、hive相关配置

  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://hadoop-master:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>

7.4、hive本地路径相关配置
把下面的value,按部署实际情况,设置成实际存在的文件夹即可

  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>${system:java.io.tmpdir}/${system:user.name}</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
    <name>hive.querylog.location</name>
    <value>${system:java.io.tmpdir}/${system:user.name}</value>
    <description>Location of Hive run time structured log file</description>
  </property>
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>${system:java.io.tmpdir}/${system:user.name}/operation_logs</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>

8.配置完成:)

Hadoop增删改查(Java)

需要的jar包在hadoop里都可以找到,下面的例子中,至少需要这些jar包:

commons-cli-1.2.jar
commons-collections-3.2.1.jar
commons-configuration-1.6.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
guava-11.0.2.jar
hadoop-auth-2.7.1.jar
hadoop-common-2.7.1.jar
hadoop-hdfs-2.7.1.jar
htrace-core-3.1.0-incubating.jar
log4j-1.2.17.jar
protobuf-java-2.5.0.jar
servlet-api.jar
slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar

代码如下:

package com.neohope.hadoop.test;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class HDFSTest {

	static Configuration hdfsConfig;
	static {
		hdfsConfig = new Configuration();
		hdfsConfig.addResource(new Path("etc/hadoop/core-site.xml"));
		hdfsConfig.addResource(new Path("etc/hadoop/hdfs-site.xml"));
	}

	// 创建文件夹
	public static void createDirectory(String dirPath) throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path p = new Path(dirPath);
		try {
			fs.mkdirs(p);
		} finally {
			fs.close();
		}
	}

	// 删除文件夹
	public static void deleteDirectory(String dirPath) throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path p = new Path(dirPath);
		try {
			fs.deleteOnExit(p);
		} finally {
			fs.close();
		}
	}

	// 重命名文件夹
	public static void renameDirectory(String oldDirPath, String newDirPath)
			throws IOException {
		renameFile(oldDirPath, newDirPath);
	}

	// 枚举文件
	public static void listFiles(String dirPath) throws IOException {
		FileSystem hdfs = FileSystem.get(hdfsConfig);
		Path listf = new Path(dirPath);
		try {
			FileStatus statuslist[] = hdfs.listStatus(listf);
			for (FileStatus status : statuslist) {
				System.out.println(status.getPath().toString());
			}
		} finally {
			hdfs.close();
		}
	}

	// 新建文件
	public static void createFile(String filePath) throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path p = new Path(filePath);
		try {
			fs.createNewFile(p);
		} finally {
			fs.close();
		}
	}

	// 删除文件
	public static void deleteFile(String filePath) throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path p = new Path(filePath);
		try {
			fs.deleteOnExit(p);
		} finally {
			fs.close();
		}
	}

	// 重命名文件
	public static void renameFile(String oldFilePath, String newFilePath)
			throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path oldPath = new Path(oldFilePath);
		Path newPath = new Path(newFilePath);
		try {
			fs.rename(oldPath, newPath);
		} finally {
			fs.close();
		}
	}

	// 上传文件
	public static void putFile(String locaPath, String hdfsPath)
			throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path src = new Path(locaPath);
		Path dst = new Path(hdfsPath);
		try {
			fs.copyFromLocalFile(src, dst);
		} finally {
			fs.close();
		}
	}

	// 取回文件
	public static void getFile(String hdfsPath, String locaPath)
			throws IOException {
		FileSystem fs = FileSystem.get(hdfsConfig);
		Path src = new Path(hdfsPath);
		Path dst = new Path(locaPath);
		try {
			fs.copyToLocalFile(false, src, dst, true);
		} finally {
			fs.close();
		}
	}

	// 读取文件
	public static void readFile(String hdfsPath) throws IOException {
		FileSystem hdfs = FileSystem.get(hdfsConfig);
		Path filePath = new Path(hdfsPath);

		InputStream in = null;
		BufferedReader buff = null;
		try {
			in = hdfs.open(filePath);
			buff = new BufferedReader(new InputStreamReader(in));
			String str = null;
			while ((str = buff.readLine()) != null) {
				System.out.println(str);
			}
		} finally {
			buff.close();
			in.close();
			hdfs.close();
		}
	}

	public static void main(String[] args) throws IOException {
		System.setProperty("HADOOP_USER_NAME", "hadoop");
		// createDirectory("hdfs://hadoop-master:9000/usr");
		// createDirectory("hdfs://hadoop-master:9000/usr/hansen");
		// createDirectory("hdfs://hadoop-master:9000/usr/hansen/test");
		// renameDirectory("hdfs://hadoop-master:9000/usr/hansen/test","hdfs://hadoop-master:9000/usr/hansen/test01");
		// createFile("hdfs://hadoop-master:9000/usr/hansen/test01/hello.txt");
		// renameFile("hdfs://hadoop-master:9000/usr/hansen/test01/hello.txt","hdfs://hadoop-master:9000/usr/hansen/test01/hello01.txt");
		// putFile("hello.txt","hdfs://hadoop-master:9000/usr/hansen/test01/hello02.txt");
		// getFile("hdfs://hadoop-master:9000/usr/hansen/test01/hello02.txt","hello02.txt");
		// readFile("hdfs://hadoop-master:9000/usr/hansen/test01/hello02.txt");
		listFiles("hdfs://hadoop-master:9000/usr/hansen/test01/");
	}

}

Hadoop Linux Native 编译说明

首先说明一下,如果要使用Linux Native的话,Hadoop是已经自带了哦

然后,如果要编译的话,建议直接从Hadoop源码按官方的说明进行编译,不要像我这样自己搞。。。

如果你喜欢折腾,请继续看:

1、按源码架构拷贝下面的文件及文件夹

hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\main\native
hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\CMakeLists.txt
hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\config.h.cmake
hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\JNIFlags.cmake
hadoop-2.5.2-src\hadoop-hdfs-project\hadoop-hdfs\src\main\native
hadoop-2.5.2-src\hadoop-hdfs-project\hadoop-hdfs\src\CMakeLists.txt(可能需要调整一下依赖文件JNIFlags.cmake的相对路径)
hadoop-2.5.2-src\hadoop-hdfs-project\hadoop-hdfs\src\config.h.cmake

2、编译libhadoop
2.1、检查并安装以来关系

#需要gcc、make、jdk,这些一般大家都有了
#需要zlib
apt-get install zlib1g-dev
#需要cmake
apt-get install cmake

2.2、用cmake生成Makefile

cmake ./src/ -DGENERATED_JAVAH=~/Build/hadoop-2.5.2-src/build/hadoop-common-project/hadoop-common/native/javah -DJVM_ARCH_DATA_MODEL=64 -DREQUIRE_BZIP2=false -DREQUIRE_SNAPPY=false

2.3、用javah生成头文件
需要三个jar包,hadoop-common,hadoop-annotations,guava

javah org.apache.hadoop.io.compress.lz4.Lz4Compressor
javah org.apache.hadoop.io.compress.lz4.Lz4Decompressor
javah org.apache.hadoop.io.compress.zlib.ZlibCompressor
javah org.apache.hadoop.io.compress.zlib.ZlibDecompressor
javah org.apache.hadoop.io.nativeio.NativeIO 
javah org.apache.hadoop.io.nativeio.SharedFileDescriptorFactory
javah org.apache.hadoop.net.unix.DomainSocket
javah org.apache.hadoop.net.unix.DomainSocketWatcher
javah org.apache.hadoop.security.JniBasedUnixGroupsMapping
javah org.apache.hadoop.security.JniBasedUnixNetgroupsMapping
javah org.apache.hadoop.util.NativeCrc32

将生成的文件,拷贝到对应的c文件夹中

2.3、生成

make

3、编译libhdfs
3.1、用cmake生成Makefile

cmake ./src/ -DGENERATED_JAVAH=~/Build/hadoop-2.5.2-src/build/hadoop-common-project/hadoop-common/native/javah -DJVM_ARCH_DATA_MODEL=64 -DREQUIRE_LIBWEBHDFS=false -DREQUIRE_FUSE=false

3.2、生成

make

4、将生成的文件拷贝到HADOOP_HOME/lib/mynative

5、修改/etc/profile,增加下面一行

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/mynative"

6、刷新配置

source /etc/profile

搞定!

Hadoop Windows Native 编译说明

1、首先,下载hadoop-2.5.2-src源码

拷贝文件夹hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\main\native
拷贝文件夹from hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\main\winutils

2、设置JAVA_HOME及PATH环境变量

3、生成javah的头文件
解压hadoop-common-2.5.1.jar,然后运行

javah org.apache.hadoop.util.NativeCrc32
javah org.apache.hadoop.io.compress.lz4.Lz4Compressor
javah org.apache.hadoop.io.compress.lz4.Lz4Decompressor
javah org.apache.hadoop.io.nativeio.NativeIO
javah org.apache.hadoop.security.JniBasedUnixGroupsMapping
javah org.apache.hadoop.security.JniBasedUnixGroupsMapping

4、打开winutils.sln,修改输出路径到../bin,编译

5、打开native.sln,修改输出路径到../bin,修改winutils.lib引用地址,编译

6、拷贝exe及dll文件到HADOOP_HOME/bin,搞定

常见问题:
1、编译的硬件平台要与java位数一致(x86,x64),否则dll无法加载
2、出问题时,先运行winutils.exe,无法运行时,要安装对应VS版本的vcredist可再发行包就好了
3、如果提示”unable to load native hadoop-library for your platform”的话,那只需要在JVM启动参数中制定native library的路径,就可以了

如果比较着急的话,可以到我的github上下载2.5.2版本的native binary:hadoop-windows-native

Hadoop环境搭建(下)

1、新建文件夹

bin/hadoop fs -ls /
bin/hadoop fs -mkdir /usr
bin/hadoop fs -mkdir /usr/neohope
bin/hadoop fs -mkdir /usr/neohope/test

2、从本地拷贝文件到hdfs

mkdir ~/test
echo hello hadoop >> ~/test/hello.txt
bin/hadoop fs -put ~/test/hello.txt /usr/neohope/test/

3、查看远程文件

bin/hadoop fs -ls /usr/neohope/test
bin/hadoop fs -cat /usr/neohope/test/hello.txt

4、从hdfs拷贝文件到本地

bin/hadoop fs -get /usr/neohope/test/hello.txt ~/test/hello1.txt
cat ~/test/hello1.txt

5、语法说明

hadoop@hadoop-master:~/hadoop-2.7.1$ bin/hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] <path> ...]
	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-d] [-h] [-R] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Hadoop环境搭建(中)

1、将hadoop解压

su hadoop
cd ~
tar -zxvf /home/neohope/Desktop/hadoop-2.7.1.tar.gz

2、修改/home/hadoop/hadoop-2.7.1/etc/hadoop/路径下配置
2.1、core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop-master:9000</value>
  </property>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://hadoop-master:9000</value>
  </property>

  <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/home/hadoop/hadoop-2.7.1/tmp</value>
  </property>

  <property>
    <name>io.file.buffer.size</name>
    <value>131702</value>
  </property>

</configuration>

2.2、hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/hadoop/hadoop-2.7.1/hdfs/name</value>
  </property>


  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/hadoop/hadoop-2.7.1/hdfs/data</value>
  </property>


  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>


  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop-master:9001</value>
  </property>


  <property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
  </property>

</configuration>

2.3、mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>

  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop-master:10020</value>
  </property>

  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop-master:19888</value>
  </property>

  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>2048</value>      
  </property>

  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>2048</value>      
  </property>

</configuration>

2.4、yarn-site.xml

<?xml version="1.0"?>

<!-- Site specific YARN configuration properties -->

<configuration>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <property>
    <name>yarn.resourcemanager.address</name>
    <value>hadoop-master:8032</value>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>hadoop-master:8030</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-ticker.address</name>
    <value>hadoop-master:8031</value>
  </property>

  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>hadoop-master:8033</value>
  </property>

  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>hadoop-master:8088</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource.memory-mb</name>
    <value>2048</value>
  </property>

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
  </property>

</configuration>

2.5、slaves

#localhost
hadoop-slave01
hadoop-slave02

3、修改/home/hadoop/hadoop-2.7.1/etc/hadoop/路径下JAVA路径
3.1、hadoop-env.sh

# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/java/jdk1.7.0_79

3.2、yarn-env.sh

# some Java parameters
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
if [ "$JAVA_HOME" != "" ]; then
  #echo "run java in $JAVA_HOME"
  #JAVA_HOME=$JAVA_HOME
  JAVA_HOME=/usr/java/jdk1.7.0_79
fi

4、分发hadoop文件夹到各slave

scp -r /home/hadoop/hadoop-2.7.1 hadoop@hadoop-slave01:~/
scp -r /home/hadoop/hadoop-2.7.1 hadoop@hadoop-slave02:~/

5、初始化主服务器

cd ~/hadoop-2.7.1
bin/hdfs namenode -format

6、启动hadoop

cd ~/hadoop-2.7.1
sbin/start-dfs.sh
sbin/start-yarn.sh

7、查看hadoop进程信息

/usr/java/jdk1.7.0_79/bin/jps

8、查看cluster信息

http://10.10.10.3:8088

9、查看hdfs文件系统信息

http://10.10.10.3:50070

10、Hadoop常用端口如下

端口 作用
9000 fs.defaultFS
9001 dfs.namenode.rpc-address口
50070 dfs.namenode.http-address
50470 dfs.namenode.https-address
50100 dfs.namenode.backup.address
50105 dfs.namenode.backup.http-address
50090 dfs.namenode.secondary.http-address
50091 dfs.namenode.secondary.https-address
50020 dfs.datanode.ipc.address
50075 dfs.datanode.http.address
50475 dfs.datanode.https.address
50010 dfs.datanode.address
8480 dfs.journalnode.rpc-address
8481 dfs.journalnode.https-address
8032 yarn.resourcemanager.address
8088 yarn.resourcemanager.webapp.address
8090 yarn.resourcemanager.webapp.https.address
8030 yarn.resourcemanager.scheduler.address
8031 yarn.resourcemanager.resource-tracker.address
8033 yarn.resourcemanager.admin.address
8042 yarn.nodemanager.webapp.address
8040 yarn.nodemanager.localizer.address
8188 yarn.timeline-service.webapp.address
10020 mapreduce.jobhistory.address
19888 mapreduce.jobhistory.webapp.address
2888 ZooKeeper,Leader用来监听Follower的连接
3888 ZooKeeper,用于Leader选举
2181 ZooKeeper,用来监听客户端的连接
60010 hbase.master.info.port
60000 hbase.master.port
60030 hbase.regionserver.info.port
60020 hbase.regionserver.port
8080 hbase.rest.port
10000 hive.server2.thrift.port
9083 hive.metastore.uris

Hadoop环境搭建(上)

1、环境准备

VMWare8
Debian6
JDK7
Hadoop2.7.1(这个是第一次部署时用的版本)
*如果要看后续文章,建议使用(hadoop-2.5.2  hbase-1.1.2  hive-1.2.1  spark-2.0.0)

2、安装虚拟机,安装VMTools

su
apt-get install gcc
apt-get install linux-headers-$(uname -r)
apt-get install build-essential
./vmware-install.pl

设置共享目录,将需要的文件拷贝到虚拟机。
当然也可以设置好虚拟的的ssh后,用scp或winscp将文件拷贝到虚拟机。

3.网络配置为NAT,网卡设置为dhcp
修改配置文件/etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

根据需要(一般用不到),修改配置文件/etc/resolv.conf

nameserver xxx.xxx.xxx.xxx

重启网卡

su
ifconfig eth0 down
ifconfig eth0 up

3、安装hadoop需要的软件

su
apt-get install openssh-server
apt-get install ssh
apt-get install rsync
mkdir /usr/java
cd /usr/java
tar -zxvf /home/neohope/Desktop/jdk-7u79-linux-x64.gz

4、新建用户hadoop

sudo adduser hadoop -home /home/hadoop

5、设置环境变量
修改文件/etc/profile,增加下面的内容

export JAVA_HOME=/usr/java/jdk1.7.0_79
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH

export HADOOP_HOME=/home/hadoop/hadoop-2.7.1
export PATH=$PATH:$HADOOP_HOME/lib

测试一下

source /etc/profile
cd $JAVA_HOME
echo $HADOOP_HOME

5、切换到用户hadoop,并实现ssh免密码认证
5.1RSA方式

#Ubuntu16.04以后,默认不支持dsa
#rsa证书生成
su hadoop
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh localhost

5.2DSA方式

#Ubuntu16.04以后,默认不支持dsa,要手工开启dsa支持
#编辑文件
sudo vi /etc/ssh/sshd_config
#添加下面几行
PermitRootLogin no
PermitEmptyPasswords no
PasswordAuthentication yes
PubkeyAuthentication yes
ChallengeResponseAuthentication no
PubkeyAcceptedKeyTypes=+ssh-dss

#重启sshd
systemctl reload sshd

#dsa证书生成
su hadoop
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh localhost

6、拷贝两份虚拟机,设置这三个虚拟机的名称及固定ip(NAT),修改后配置为

机器名 IP
hadoop-master 10.10.10.3
hadoop-slave01 10.10.10.4
hadoop-slave02 10.10.10.5
主机 10.10.10.1
网关 10.10.10.2

以hadoop-master为例,其配置为

#/etc/network/interfaes
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
address 10.10.10.3
netmask 255.0.0.0
gateway 10.10.10.2
#dns-nameservers 114.114.114.114
#/etc/hosts
127.0.0.1	localhost
10.10.10.3	hadoop-master
10.10.10.4	hadoop-slave01
10.10.10.5	hadoop-slave02
#/etc/hostname
hadoop-master

7、三台虚拟机都开启后,就可以相互用ssh直接访问了

su hadoop
ssh hadoop-master
ssh hadoop-slave01
ssh hadoop-slave02

8、Hadoop要求的ssh免密码登录范围(IP和主机名):
1) NameNode能免密码登录所有的DataNode
2) SecondaryNameNode能免密码登录所有的DataNode
3) NameNode能免密码登录自己
4) SecondaryNameNode能免密码登录自己
5) NameNode能免密码登录SecondaryNameNode
6) SecondaryNameNode能免密码登录NameNode
7) DataNode能免密码登录自己
8) DataNode不需要配置免密码登录NameNode、SecondaryNameNode和其它DataNode