Hadoop环境搭建(中)

1、将hadoop解压

su hadoop
cd ~
tar -zxvf /home/neohope/Desktop/hadoop-2.7.1.tar.gz

2、修改/home/hadoop/hadoop-2.7.1/etc/hadoop/路径下配置
2.1、core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop-master:9000</value>
  </property>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://hadoop-master:9000</value>
  </property>

  <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/home/hadoop/hadoop-2.7.1/tmp</value>
  </property>

  <property>
    <name>io.file.buffer.size</name>
    <value>131702</value>
  </property>

</configuration>

2.2、hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/hadoop/hadoop-2.7.1/hdfs/name</value>
  </property>


  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/hadoop/hadoop-2.7.1/hdfs/data</value>
  </property>


  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>


  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop-master:9001</value>
  </property>


  <property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
  </property>

</configuration>

2.3、mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>

  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop-master:10020</value>
  </property>

  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop-master:19888</value>
  </property>

  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>2048</value>      
  </property>

  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>2048</value>      
  </property>

</configuration>

2.4、yarn-site.xml

<?xml version="1.0"?>

<!-- Site specific YARN configuration properties -->

<configuration>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <property>
    <name>yarn.resourcemanager.address</name>
    <value>hadoop-master:8032</value>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>hadoop-master:8030</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-ticker.address</name>
    <value>hadoop-master:8031</value>
  </property>

  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>hadoop-master:8033</value>
  </property>

  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>hadoop-master:8088</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource.memory-mb</name>
    <value>2048</value>
  </property>

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
  </property>

</configuration>

2.5、slaves

#localhost
hadoop-slave01
hadoop-slave02

3、修改/home/hadoop/hadoop-2.7.1/etc/hadoop/路径下JAVA路径
3.1、hadoop-env.sh

# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/java/jdk1.7.0_79

3.2、yarn-env.sh

# some Java parameters
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/
if [ "$JAVA_HOME" != "" ]; then
  #echo "run java in $JAVA_HOME"
  #JAVA_HOME=$JAVA_HOME
  JAVA_HOME=/usr/java/jdk1.7.0_79
fi

4、分发hadoop文件夹到各slave

scp -r /home/hadoop/hadoop-2.7.1 hadoop@hadoop-slave01:~/
scp -r /home/hadoop/hadoop-2.7.1 hadoop@hadoop-slave02:~/

5、初始化主服务器

cd ~/hadoop-2.7.1
bin/hdfs namenode -format

6、启动hadoop

cd ~/hadoop-2.7.1
sbin/start-dfs.sh
sbin/start-yarn.sh

7、查看hadoop进程信息

/usr/java/jdk1.7.0_79/bin/jps

8、查看cluster信息

http://10.10.10.3:8088

9、查看hdfs文件系统信息

http://10.10.10.3:50070

10、Hadoop常用端口如下

端口 作用
9000 fs.defaultFS
9001 dfs.namenode.rpc-address口
50070 dfs.namenode.http-address
50470 dfs.namenode.https-address
50100 dfs.namenode.backup.address
50105 dfs.namenode.backup.http-address
50090 dfs.namenode.secondary.http-address
50091 dfs.namenode.secondary.https-address
50020 dfs.datanode.ipc.address
50075 dfs.datanode.http.address
50475 dfs.datanode.https.address
50010 dfs.datanode.address
8480 dfs.journalnode.rpc-address
8481 dfs.journalnode.https-address
8032 yarn.resourcemanager.address
8088 yarn.resourcemanager.webapp.address
8090 yarn.resourcemanager.webapp.https.address
8030 yarn.resourcemanager.scheduler.address
8031 yarn.resourcemanager.resource-tracker.address
8033 yarn.resourcemanager.admin.address
8042 yarn.nodemanager.webapp.address
8040 yarn.nodemanager.localizer.address
8188 yarn.timeline-service.webapp.address
10020 mapreduce.jobhistory.address
19888 mapreduce.jobhistory.webapp.address
2888 ZooKeeper,Leader用来监听Follower的连接
3888 ZooKeeper,用于Leader选举
2181 ZooKeeper,用来监听客户端的连接
60010 hbase.master.info.port
60000 hbase.master.port
60030 hbase.regionserver.info.port
60020 hbase.regionserver.port
8080 hbase.rest.port
10000 hive.server2.thrift.port
9083 hive.metastore.uris

Hadoop环境搭建(上)

1、环境准备

VMWare8
Debian6
JDK7
Hadoop2.7.1(这个是第一次部署时用的版本)
*如果要看后续文章,建议使用(hadoop-2.5.2  hbase-1.1.2  hive-1.2.1  spark-2.0.0)

2、安装虚拟机,安装VMTools

su
apt-get install gcc
apt-get install linux-headers-$(uname -r)
apt-get install build-essential
./vmware-install.pl

设置共享目录,将需要的文件拷贝到虚拟机。
当然也可以设置好虚拟的的ssh后,用scp或winscp将文件拷贝到虚拟机。

3.网络配置为NAT,网卡设置为dhcp
修改配置文件/etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

根据需要(一般用不到),修改配置文件/etc/resolv.conf

nameserver xxx.xxx.xxx.xxx

重启网卡

su
ifconfig eth0 down
ifconfig eth0 up

3、安装hadoop需要的软件

su
apt-get install openssh-server
apt-get install ssh
apt-get install rsync
mkdir /usr/java
cd /usr/java
tar -zxvf /home/neohope/Desktop/jdk-7u79-linux-x64.gz

4、新建用户hadoop

sudo adduser hadoop -home /home/hadoop

5、设置环境变量
修改文件/etc/profile,增加下面的内容

export JAVA_HOME=/usr/java/jdk1.7.0_79
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH

export HADOOP_HOME=/home/hadoop/hadoop-2.7.1
export PATH=$PATH:$HADOOP_HOME/lib

测试一下

source /etc/profile
cd $JAVA_HOME
echo $HADOOP_HOME

5、切换到用户hadoop,并实现ssh免密码认证
5.1RSA方式

#Ubuntu16.04以后,默认不支持dsa
#rsa证书生成
su hadoop
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh localhost

5.2DSA方式

#Ubuntu16.04以后,默认不支持dsa,要手工开启dsa支持
#编辑文件
sudo vi /etc/ssh/sshd_config
#添加下面几行
PermitRootLogin no
PermitEmptyPasswords no
PasswordAuthentication yes
PubkeyAuthentication yes
ChallengeResponseAuthentication no
PubkeyAcceptedKeyTypes=+ssh-dss

#重启sshd
systemctl reload sshd

#dsa证书生成
su hadoop
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh localhost

6、拷贝两份虚拟机,设置这三个虚拟机的名称及固定ip(NAT),修改后配置为

机器名 IP
hadoop-master 10.10.10.3
hadoop-slave01 10.10.10.4
hadoop-slave02 10.10.10.5
主机 10.10.10.1
网关 10.10.10.2

以hadoop-master为例,其配置为

#/etc/network/interfaes
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
address 10.10.10.3
netmask 255.0.0.0
gateway 10.10.10.2
#dns-nameservers 114.114.114.114
#/etc/hosts
127.0.0.1	localhost
10.10.10.3	hadoop-master
10.10.10.4	hadoop-slave01
10.10.10.5	hadoop-slave02
#/etc/hostname
hadoop-master

7、三台虚拟机都开启后,就可以相互用ssh直接访问了

su hadoop
ssh hadoop-master
ssh hadoop-slave01
ssh hadoop-slave02

8、Hadoop要求的ssh免密码登录范围(IP和主机名):
1) NameNode能免密码登录所有的DataNode
2) SecondaryNameNode能免密码登录所有的DataNode
3) NameNode能免密码登录自己
4) SecondaryNameNode能免密码登录自己
5) NameNode能免密码登录SecondaryNameNode
6) SecondaryNameNode能免密码登录NameNode
7) DataNode能免密码登录自己
8) DataNode不需要配置免密码登录NameNode、SecondaryNameNode和其它DataNode