Alex 的 Hadoop 菜鸟教程: 第2课之1 Hadoop 安装教程


接上一个教程:http://blog.csdn.net/nsrainbow/article/details/36629339

本教程是在 Centos6 下使用yum来安装 CDH5 版本的 hadoop 的教程。 如果没有添加yum源的请参考上一个教程:http://blog.csdn.net/nsrainbow/article/details/36629339

开始安装非HA模式


1. 添加库key

$ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera


2. 安装CDH5

2.1 安装Resource Manager host

$ sudo yum clean all
$ sudo yum install hadoop-yarn-resourcemanager -y




2.2 安装 NameNode host

$ sudo yum clean all
$ sudo yum install hadoop-hdfs-namenode -y




2.3 安装 Secondary NameNode host

$ sudo yum clean all
$ sudo yum install hadoop-hdfs-secondarynamenode -y




2.4 安装 nodemanager , datanode, mapreduce (官方说明是在除了 Resource Manager以外的机子上装这些,但是我们现在就一台机子,所以就在这台机子上装)

$ sudo yum clean all
$ sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce -y




2.5 安装 hadoop-mapreduce-historyserver hadoop-yarn-proxyserver (官方说是在cluster中挑一台做host,但是我们就一台,就直接在这台上装)

$ sudo yum clean all
$ sudo yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver -y




2.6 安装 hadoop-client (用户连接hadoop的客户端,官方说在客户端装,我们就直接在这台上装)

$ sudo yum clean all
$ sudo yum install hadoop-client -y




3. 部署CDH

3.1 配置计算机名(默认是localhost)

先看看自己的hostname有没有设置
$ sudo vim /etc/sysconfig/network
HOSTNAME=localhost.localdomain




如果HOSTNAME是 localhost.localdomain的话就改一下
HOSTNAME=myhost.mydomain.com
然后再运行下,保证立即生效
$ sudo hostname myhost.mydomain.com




检查一下是否设置生效
$ sudo uname -a




3.2 修改配置文件

先切换到root用户,免得每行命令之前都加一个sudo,所以以下教程都是用root角度写的
$ sudo su -
$ cd /etc/hadoop/conf
$ vim core-site.xml


在 <configuration>...</configuration> 中增加
<property>
	<name>fs.defaultFS</name>
	<value>hdfs://myhost.mydomain.com:8020</value>
</property>

编辑hdfs-site.xml


$ vim hdfs-site.xml


在 <configuration>...</configuration> 中添加
<property>
 <name>dfs.permissions.superusergroup</name>
 <value>hadoop</value>
</property>




3.3 配置存储文件夹

在 namenode 机子上配置 hdfs.xml 用来存储name元数据(我们只有一台机,所以既是namenode又是datanode)
$ vim hdfs-site.xml


修改dfs.name.dir 为 dfs.namenode.name.dir(dfs.name.dir已经过时),并修改属性值,一般来说我们的 /data 或者 /home/data 都是挂载大硬盘数据用的,所以把存储文件夹指向这个路径里面的文件夹比较较好
<property>
     <name>dfs.namenode.name.dir</name>
     <value>file:///data/hadoop-hdfs/1/dfs/nn</value>
  </property>



在 datanode上配置 hdfs.xml 用来存储实际数据(我们只有一台机,所以既是namenode又是datanode)
$ vim hdfs-site.xml


增加dfs.datanode.data.dir(dfs.data.dir已经过时)配置
<property>
     <name>dfs.datanode.data.dir</name>
     <value>file:///data/hadoop-hdfs/1/dfs/dn,file:///data/hadoop-hdfs/2/dfs/dn</value>
  </property>




建立这些文件夹
$ mkdir -p /data/hadoop-hdfs/1/dfs/nn
$ mkdir -p /data/hadoop-hdfs/1/dfs/dn
$ mkdir -p /data/hadoop-hdfs/2/dfs/dn




修改文件夹用户
$ chown -R hdfs:hdfs /data/hadoop-hdfs/1/dfs/nn /data/hadoop-hdfs/1/dfs/dn /data/hadoop-hdfs/2/dfs/dn




修改文件夹权限
$ chmod 700 /data/hadoop-hdfs/1/dfs/nn


3.4 格式化namenode

$ sudo -u hdfs hdfs namenode -format




3.5 配置 Secondary NameNode

在hdfs-site.xml中加入
<property>
  <name>dfs.namenode.http-address</name>
  <value>0.0.0.0:50070</value>
  <description>
    The address and the base port on which the dfs NameNode Web UI will listen.
  </description>
</property>




3.6 启动hadoop

$ for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x start ; done
Starting Hadoop nodemanager:                               [  OK  ]
starting nodemanager, logging to /var/log/hadoop-yarn/yarn-yarn-nodemanager-xmseapp03.ehealthinsurance.com.out
Starting Hadoop proxyserver:                               [  OK  ]
starting proxyserver, logging to /var/log/hadoop-yarn/yarn-yarn-proxyserver-xmseapp03.ehealthinsurance.com.out
Starting Hadoop resourcemanager:                           [  OK  ]
starting resourcemanager, logging to /var/log/hadoop-yarn/yarn-yarn-resourcemanager-xmseapp03.ehealthinsurance.com.out
Starting Hadoop datanode:                                  [  OK  ]
starting datanode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-datanode-xmseapp03.ehealthinsurance.com.out
Starting Hadoop namenode:                                  [  OK  ]
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-xmseapp03.ehealthinsurance.com.out
Starting Hadoop secondarynamenode:                         [  OK  ]
starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-xmseapp03.ehealthinsurance.com.out
...




都成功后用jps看下
$jps
17033 NodeManager
16469 DataNode
17235 ResourceManager
17522 JobHistoryServer
16565 NameNode
16680 SecondaryNameNode
17593 Jps



4 客户端测试

打开你的浏览器输入 http://<hadoop server ip>:50070
如果看到
Hadoop Administration
DFS Health/Status
这样的字样就成功进入了hadoop的命令控制台

相关内容