VMWare9下基于Ubuntu12.10搭建Hadoop-1.2.1集群


VMWare9下基于Ubuntu12.10搭建Hadoop-1.2.1集群

 

最近在学习Hadoop,把hadoop集群环境搭建的过程记录一下,方便查询,方案中有好多细节的东西,可能会比较啰嗦,对于新手来说或许更有帮助,闲话不多说,进入正题。

 

搭建5个节点的Hadoop集群环境

1.        环境说明

使用VMWare创建5台Ubuntu虚拟机,环境详细信息如下:

虚拟机

操作系统

JDK

Hadoop

VMWare Workstation 9

ubuntu-12.10-server-amd64

jdk-7u51-linux-x64

hadoop-1.2.1

 

主机名

IP地址

虚拟机名

节点内容

master

192.168.1.30

Ubuntu64-Master

namenode, Jobtracker

secondary

192.168.1.39

Ubuntu64-Secondary

secondarynamenode

slaver1

192.168.1.31

Ubuntu64-slaver1

datanode, tasktracker

slaver2

192.168.1.32

Ubuntu64-slaver2

datanode, tasktracker

slaver3

192.168.1.33

Ubuntu64-slaver3

datanode, tasktracker

 

 

2.        搭建虚拟机系统

下载Ubuntu server版64位系统,iso版本,方便在vmware上安装。

 

每台虚拟机配置1个双核cpu,1G RAM,20G硬盘,设置ShareFolder为共享文件夹,方便Windows主机向虚拟机传送文件包。

 

Ubuntu系统easy方式安装,创建hadoop用户,后续hadoop,zookeeper,hbase都用hadoop用户来部署。

 

可以先安装一台主机,以master为模板,配置好了之后,用vmvare的克隆功能复制出其它主机,然后调整下ip和主机名。

 

创建用户

先创建用户组

sudo addgroup hadoop

然后创建用户

sudo adduser -ingroup hadoop hadoop

 

更新安装源

先备份系统自带源内容(hadoop用户登录,所以要sudo)。

sudo cp /etc/apt/sources.list /etc/apt/sources.list.backup

修改源内容

sudo vi /etc/apt/sources.list

从网上搜索到的源内容,复制到vi中

##Ubuntu 官方更新服务器(欧洲,此为官方源,国内较慢,但无同步延迟问题,电信、移动/铁通、联通等公网用户可以使用):

deb http://archive.ubuntu.com/ubuntu/ quantal main restricted universe multiverse

deb http://archive.ubuntu.com/ubuntu/ quantal-security main restricted universe multiverse

deb http://archive.ubuntu.com/ubuntu/ quantal-updates main restricted universe multiverse

deb http://archive.ubuntu.com/ubuntu/ quantal-proposed main restricted universe multiverse

deb http://archive.ubuntu.com/ubuntu/ quantal-backports main restricted universe multiverse

deb-src http://archive.ubuntu.com/ubuntu/ quantal main restricted universe multiverse

deb-src http://archive.ubuntu.com/ubuntu/ quantal-security main restricted universe multiverse

deb-src http://archive.ubuntu.com/ubuntu/ quantal-updates main restricted universe multiverse

deb-src http://archive.ubuntu.com/ubuntu/ quantal-proposed main restricted universe multiverse

deb-src http://archive.ubuntu.com/ubuntu/ quantal-backports main restricted universe multiverse

 

##Ubuntu官方提供的其他软件(第三方闭源软件等):

deb http://archive.canonical.com/ubuntu/ quantal partner

deb http://extras.ubuntu.com/ubuntu/ quantal main

 

##骨头兄亲自搭建并维护的 Ubuntu 源(该源位于浙江杭州百兆共享宽带的电信机房),包含 Deepin 等镜像:

deb http://ubuntu.srt.cn/ubuntu/ quantal main restricted universe multiverse

deb http://ubuntu.srt.cn/ubuntu/ quantal-security main restricted universe multiverse

deb http://ubuntu.srt.cn/ubuntu/ quantal-updates main restricted universe multiverse

deb http://ubuntu.srt.cn/ubuntu/ quantal-proposed main restricted universe multiverse

deb http://ubuntu.srt.cn/ubuntu/ quantal-backports main restricted universe multiverse

deb-src http://ubuntu.srt.cn/ubuntu/ quantal main restricted universe multiverse

deb-src http://ubuntu.srt.cn/ubuntu/ quantal-security main restricted universe multiverse

deb-src http://ubuntu.srt.cn/ubuntu/ quantal-updates main restricted universe multiverse

deb-src http://ubuntu.srt.cn/ubuntu/ quantal-proposed main restricted universe multiverse

deb-src http://ubuntu.srt.cn/ubuntu/ quantal-backports main restricted universe multiverse

 

##搜狐更新服务器(山东联通千兆接入,官方中国大陆地区镜像跳转至此) ,包含其他开源镜像:

deb http://mirrors.sohu.com/ubuntu/ quantal main restricted universe multiverse

deb http://mirrors.sohu.com/ubuntu/ quantal-security main restricted universe multiverse

deb http://mirrors.sohu.com/ubuntu/ quantal-updates main restricted universe multiverse

deb http://mirrors.sohu.com/ubuntu/ quantal-proposed main restricted universe multiverse

deb http://mirrors.sohu.com/ubuntu/ quantal-backports main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ quantal main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ quantal-security main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ quantal-updates main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ quantal-proposed main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ quantal-backports main restricted universe multiverse

执行更新源命令,这样系统中的安装源才会被刷新。

sudo apt-get update

 

安装vim

还是用不习惯vi,安装vim替代vi。

sudo apt-get install vim

 

配置ip

ubuntu下修改ip地址,直接修改/etc/network/interfaces文件即可。

sudo vim /etc/network/interfaces

以master主机为例,修改为如下配置

# The primary network interface

auto eth0

iface eth0 inet static

address 192.168.1.30

netmask 255.255.255.0

network 192.168.1.0

broadcast 192.168.1.255

gateway 192.168.1.1

# dns-* options are implemented by the resolvconf package, if installed

dns-nameservers 8.8.8.8

 

配置主机名

ubuntu下主机名文件为/etc/hostname,还有/etc/hosts用来配置主机名ip地址转换关系。

先配置主机名

sudo vim /etc/hostname

配置为

master

 

然后配置所有主机名ip地址转换

sudo vim /etc/hosts

配置为(一次把所有服务器主机都配置上,一劳永逸)

127.0.0.1        localhost

192.168.1.30      master

192.168.1.31    slaver1

192.168.1.32    slaver2

192.168.1.33    slaver3

192.168.1.39      secondary

hosts文件中的配置参数格式为

ip地址     主机名     别名(可以有0个或n个,空格分开)

 

克隆系统副本

把安装配置好的ubuntu克隆出多个副本,构建出5台ubuntu小集群,然后分别修改ip、修改主机名。

 

3.        安装配置SSH

安装SSH

采用apt-get方式安装,方便省事。

sudo apt-get install openssh-server

用命令查看ssh服务是否启动

ps –ef|grep ssh

有如下信息就是启动了

hadoop    2147  2105  0 13:11 ?        00:00:00 /usr/bin/ssh-agent /usr/bin/dbus-launch --exit-with-session gnome-session --session=ubuntu

root      7226     1  0 23:31 ?        00:00:00 /usr/sbin/sshd -D

hadoop    7287  6436  0 23:33 pts/0    00:00:00 grep --color=auto ssh

ssh分为client和server,client用来ssh登录其它服务器,server用来提供ssh服务,供用户ssh远程登录。ubuntu默认安装了ssh client,所以要安装sshserver。

 

生成RSA密钥对

在hadoop用户下,用ssh的命令生成密钥对。

ssh-keygen –t rsa

期间会询问是否为密钥设置密码,空密码即可,没有错误的话,在hadoop的.ssh目录下会生成密钥对文件(id_rsa和id_rsa.pub文件),id_rsa文件为私钥,服务器自己保存,防止外泄,id_rsa.pub文件为公钥,分发给其它需要免密码访问的服务器。

注:ssh和-keygen之间不能有空格,ssh-keygen –t rsa –P “” 命令可以免密钥密码。

 

进入.ssh目录,将公钥追加到授权认证文件(authorized_keys)中,authorized_keys用来存储所有服务器的公钥信息。

cat id_rsa.pub >> authorized_keys

authorized_keys文件中,公钥以ssh-rsa开头,用户名@主机名结尾,多个服务器的公钥顺序保存,示例如下。

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDs5A9sjk+44DtptGw4fXm5n0qbpnSnFsqRQJnbyD4DGMG7AOpfrEZrMmRiNJA8GZUIcrN71pHEgQimoQGD5CWyVgi1ctWFrULOnGksgixJj167m+FPdpcCFJwfAS34bD6DoVXJgyjWIDT5UFz+RnElNC14s8F0f/w44EYM49y2dmP8gGmzDQ0jfIgPSknUSGoL7fSFJ7PcnRrqWjQ7iq3B0gwyfCvWnq7OmzO8VKabUnzGYST/lXCaSBC5WD2Hvqep8C9+dZRukaa00g2GZVH3UqWO4ExSTefyUMjsal41YVARMGLEfyZzvcFQ8LR0MWhx2WMSkYp6Z6ARbdHZB4MN hadoop@master

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2Hb6mCi6sd6IczIn/pBbj8L9PMS1ac0tlalex/vlSRj2E6kzUrw/urEUVeO76zFcZgjUgKvoZsNAHGrr1Bfw8FiiDcxPtlIREl2L9Qg8Vd0ozgE22bpuxBTn1Yed/bbJ/VxGJsYbOyRB/mBCvEI4ECy/EEPf5CRMDgiTL9XP86MNJ/kgG3odR6hhSE3Ik/NMARTZySXE90cFB0ELr/Io4SaINy7b7m6ssaP16bO8aPbOmsyY2W2AT/+O726Py6tcxwhe2d9y2tnJiELfrMLUPCYGEx0Z/SvEqWhEvvoGn8qnpPJCGg6AxYaXy8jzSqWNZwP3EcFqmVrg9I5v8mvDd hadoop@slaver1

 

分发公钥

服务器把自己的公钥内容分发给其它服务器,就是为了能免密码登录其它服务器,把所有服务器的公钥集中到一台服务器上,然后集中分发给其它服务器,这样处理,5台服务器可以随意互相免密码访问。

 

分发采用scp命令,scp需要双方服务器都启动ssh服务。scp初次访问需要输入密码。

除master服务器执行如下命令,复制公钥。

cd .ssh

scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver1

scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver2

scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver3

scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.secondary

master服务器执行如下命令,集中公钥。

cd .ssh

cat id_rsa.pub.slaver1 >> authorized_keys

cat id_rsa.pub.slaver2 >> authorized_keys

cat id_rsa.pub.slaver3 >> authorized_keys

cat id_rsa.pub.secondary >> authorized_keys

master服务器执行如下命令,分发公钥。

scp authorized_keys hadoop @slaver1:/home/hadoop/.ssh/authorized_keys

scp authorized_keys hadoop @slaver2:/home/hadoop/.ssh/authorized_keys

scp authorized_keys hadoop @slaver3:/home/hadoop/.ssh/authorized_keys

scp authorized_keys hadoop @secondary:/home/hadoop/.ssh/authorized_keys

 

测试免密码访问,输入

ssh slaver1

 

 

4.        安装配置JDK

部署JDK

解jdk-7u51-linux-x64.tar.gz包到/usr/lib/jdk1.7.0_51

tar –zxvf jdk-7u51-linux-x64.tar.gz –C /usr/lib/

 

配置环境变量

把jdk设置到全局环境变量中

sudo vim /etc/profile

在最下面添加如下内容

export JAVA_HOME=/usr/lib/jdk1.7.0_51

export JRE_HOME=/usr/lib/jdk1.7.0_51/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

 

注:linux下环境变量分隔符为半角冒号’:’,windows下为半角分号’;’,CLASSPATH中必须有’.’

 

通过如下命令来刷新环境变量

source /etc/profile

 

分发JDK

通过scp命令分发安装好的jdk,目录分发需要加-r参数

scp -r /usr/lib/jdk1.7.0_51 hadoop@slaver1: /usr/lib/

 

分发环境变量

/etc/profile为root用户所有,所以scp要加sudo,而且需要传输到slaver1的root用户下。

sudo scp /etc/profile root@slaver1:/etc/profile

 

 

5.        安装配置Hadoop

部署Hadoop

解hadoop-1.2.1.tar.gz包到/home/hadoop/hadoop-1.2.1

tar –zxvf hadoop-1.2.1.tar.gz –C /home/hadoop/

 

配置环境变量

把hadoop设置到全局环境变量中

sudo vim /etc/profile

在最下面添加如下内容

export HADOOP_HOME=/home/hadoop/hadoop-1.2.1

export PATH=$PATH:$HADOOP_HOME/bin

刷新环境变量

source /etc/profile

 

conf/hadoop-env.sh

export JAVA_HOME=/usr/lib/jdk1.7.0_51

export HADOOP_TASKTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_TASKTRACKER_OPTS"

export HADOOP_LOG_DIR=/home/hadoop/hadoop_home/logs

export HADOOP_MASTER=master:/home/$USER/hadoop-1.2.1

export HADOOP_SLAVE_SLEEP=0.1

export HADOOP_MASTER设置hadoop应用rsync同步配置的主机目录,hadoop启动,就会从主机把配置同步到从机。

export HADOOP_SLAVE_SLEEP=0.1设置hadoop从机同步配置请求的休眠时间(秒),避免多节点同时请求同步对主机造成负担过重。

 

conf/core-site.xml

<configuration>

    <property>

        <name>fs.default.name</name>

        <value>hdfs://master:9000</value>

    </property>

    <property>

        <name>hadoop.tmp.dir</name>

        <value>/home/hadoop/hadoop_home/tmp</value>

    </property>

    <property>

         <name>fs.trash.interval</name>

         <value>10080</value>

        <description>Number of minutes between trash checkpoints.If zero, the trash feature is disabled.</description>

    </property>

    <property>

        <name>fs.checkpoint.period</name>

        <value>600</value>

         <description>The number of seconds between two periodic checkpoints.</description>

    </property>

    <property>

        <name>fs.checkpoint.size</name>

        <value>67108864</value>

        <description>The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.</description>

    </property>

</configuration>

 

conf/hdfs-site.xml

<configuration>

    <property>

        <name>dfs.name.dir</name>

        <value>/home/hadoop/hadoop_home/name1,/home/hadoop/hadoop_home/name2</value>

        <description>  </description>

    </property>

    <property>

        <name>dfs.data.dir</name>

        <value>/home/hadoop/hadoop_home/data1,/home/hadoop/hadoop_home/data2</value>

        <description> </description>

    </property>

    <property>

         <name>fs.checkpoint.dir</name>

         <value>/home/hadoop/hadoop_home/namesecondary1,/home/hadoop/hadoop_home/namesecondary2</value>

    </property>

    <property>

        <name>dfs.replication</name>

        <value>3</value>

    </property>

    <property>

         <name>dfs.http.address</name>

         <value>master:50070</value>

    </property>

    <property>

        <name>dfs.https.address</name>

        <value>master:50470</value>

    </property>

    <property>

        <name>dfs.secondary.http.address</name>

        <value>secondary:50090</value>

    </property>

    <property>

        <name>dfs.datanode.address</name>

        <value>0.0.0.0:50010</value>

    </property>

    <property>

        <name>dfs.datanode.ipc.address</name>

        <value>0.0.0.0:50020</value>

    </property>

    <property>

        <name>dfs.datanode.http.address</name>

        <value>0.0.0.0:50075</value>

    </property>

    <property>

        <name>dfs.datanode.https.address</name>

        <value>0.0.0.0:50475</value>

    </property>

</configuration>

 

conf/mapred-site.xml

<configuration>

    <property>

        <name>mapred.job.tracker</name>

        <value>master:9001</value>

    </property>

    <property>

        <name>mapred.local.dir</name>

        <value>/home/hadoop/hadoop_home/local</value>

    </property>

    <property>

        <name>mapred.system.dir</name>

         <value>/home/hadoop/hadoop_home/system</value>

    </property>

    <property>

         <name>mapred.tasktracker.map.tasks.maximum</name>

         <value>5</value>

    </property>

    <property>

        <name>mapred.tasktracker.reduce.tasks.maximum</name>

        <value>5</value>

    </property>

    <property>

        <name>mapred.job.tracker.http.address</name>

        <value>0.0.0.0:50030</value>

    </property>

    <property>

        <name>mapred.task.tracker.http.address</name>

        <value>0.0.0.0:50060</value>

    </property>

</configuration>

 

conf/masters

secondary

conf/masters配置secondarynamenode的主机名,本方案中secondarynamenode有单独的服务器,与namenode无关。

 

conf/slaves

slaver1

slaver2

slaver3

 

分发hadoop副本

scp -r /home/hadoop/hadoop-1.2.1 hadoop@slaver1: /home/hadoop/

 

分发环境变量

/etc/profile为root用户所有,所以scp要加sudo,而且需要传输到slaver1的root用户下。

sudo scp /etc/profile root@slaver1:/etc/profile

 

 

6.        启动Hadoop测试

启动hadoop集群

hadoop启动命令如下

命令

作用

start-all.sh

启动hdfs和mapreduce守护进程,包括namenode,secondarynamenode,datanode,jobtracker,tasktracker

stop-all.sh

停止hdfs和mapreduce守护进程,包括namenode,secondarynamenode,datanode,jobtracker,tasktracker

start-dfs.sh

启动hdfs守护进程,包括namenode,secondarynamenode,datanode

stop-dfs.sh

停止hdfs守护进程,包括namenode,secondarynamenode,datanode

start-mapred.sh

启动mapreduce守护进程,包括jobtracker,tasktracker

stop-mapred.sh

停止mapreduce守护进程,包括jobtracker,tasktracker

hadoop-daemons.sh start namenode

单独启动namenode守护进程

hadoop-daemons.sh stop namenode

单独停止namenode守护进程

hadoop-daemons.sh start datanode

单独启动datanode守护进程

hadoop-daemons.sh stop datanode

单独停止datanode守护进程

hadoop-daemons.sh start secondarynamenode

单独启动secondarynamenode守护进程

hadoop-daemons.sh stop secondarynamenode

单独停止secondarynamenode守护进程

hadoop-daemons.sh start jobtracker

单独启动jobtracker守护进程

hadoop-daemons.sh stop jobtracker

单独停止jobtracker守护进程

hadoop-daemons.sh start tasktracker

单独启动tasktracker守护进程

hadoop-daemons.sh stop tasktracker

单独停止tasktracker守护进程

 

启动hadoop集群

start-all.sh

在用stop-all.sh脚本来停止hadoop的时候,查看日志,发现datanode中总会有错误出现,如下

2014-06-10 15:52:20,216 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to master/192.168.1.30:9000 failed on local exception: java.io.EOFExcept

ion

        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1150)

        at org.apache.hadoop.ipc.Client.call(Client.java:1118)

        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)

        at com.sun.proxy.$Proxy5.sendHeartbeat(Unknown Source)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1031)

        at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1588)

        at java.lang.Thread.run(Thread.java:744)

Caused by: java.io.EOFException

        at java.io.DataInputStream.readInt(DataInputStream.java:392)

        at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:845)

        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:790)

分析原因,发现datanode停止时间在namenode之后,导致与namenode连接失败,出现上面的异常,研究一下停止脚本,发现在stop-dfs.sh中停止顺序有些不太妥当(个人认为),先停止namenode,后停止datanode,我认为可以调整一下停止顺序,让namenode最后停止,这样应该能避免出现连接异常警告。

调整之后内容如下

"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR stop datanode

"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters stop secondarynamenode

"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop namenode

经过测试,就没有发现datanode有上面的异常出现了,不知道这么调整会不会对hadoop有影响,欢迎大家指正。

 

通过http://master:50070查看hdfs的监控页面。

通过http://master:50030查看mapreduce的监控页面。

 

用命令查看hdfs的情况

hadoop dfsadmin –report

hadoop fsck /

 

测试hadoop集群

通过运行hadoop自带的wordcount程序来测试hadoop集群是否运行正常。

首先创建两个input数据文件。

echo “Hello World Bye World” > text1.txt

echo “Hello Hadoop Goodbye Hadoop” > text2.txt

 

上传数据文件到hdfs中

hadoop fs –put text1.txt hdfs://master:9000/user/hadoop/input/text1.txt

hadoop fs –put text2.txt hdfs://master:9000/user/hadoop/input/text2.txt

 

运行wordcount程序

hadoop jar hadoop-examples-1.2.1.jar wordcount input/file*.txt output-0

 

运行日志如下

14/06/12 01:55:21 INFO input.FileInputFormat: Total input paths to process : 2

14/06/12 01:55:21 INFO util.NativeCodeLoader: Loaded the native-hadoop library

14/06/12 01:55:21 WARN snappy.LoadSnappy: Snappy native library not loaded

14/06/12 01:55:21 INFO mapred.JobClient: Running job: job_201406111818_0001

14/06/12 01:55:22 INFO mapred.JobClient:  map 0% reduce 0%

14/06/12 01:55:28 INFO mapred.JobClient:  map 50% reduce 0%

14/06/12 01:55:30 INFO mapred.JobClient:  map 100% reduce 0%

14/06/12 01:55:36 INFO mapred.JobClient:  map 100% reduce 33%

14/06/12 01:55:37 INFO mapred.JobClient:  map 100% reduce 100%

14/06/12 01:55:38 INFO mapred.JobClient: Job complete: job_201406111818_0001

14/06/12 01:55:38 INFO mapred.JobClient: Counters: 29

14/06/12 01:55:38 INFO mapred.JobClient:   Job Counters

14/06/12 01:55:38 INFO mapred.JobClient:     Launched reduce tasks=1

14/06/12 01:55:38 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=8281

14/06/12 01:55:38 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

14/06/12 01:55:38 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

14/06/12 01:55:38 INFO mapred.JobClient:     Launched map tasks=2

14/06/12 01:55:38 INFO mapred.JobClient:     Data-local map tasks=2

14/06/12 01:55:38 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8860

14/06/12 01:55:38 INFO mapred.JobClient:   File Output Format Counters

14/06/12 01:55:38 INFO mapred.JobClient:     Bytes Written=41

14/06/12 01:55:38 INFO mapred.JobClient:   FileSystemCounters

14/06/12 01:55:38 INFO mapred.JobClient:     FILE_BYTES_READ=79

14/06/12 01:55:38 INFO mapred.JobClient:     HDFS_BYTES_READ=272

14/06/12 01:55:38 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=166999

14/06/12 01:55:38 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=41

14/06/12 01:55:38 INFO mapred.JobClient:   File Input Format Counters

14/06/12 01:55:38 INFO mapred.JobClient:     Bytes Read=50

14/06/12 01:55:38 INFO mapred.JobClient:   Map-Reduce Framework

14/06/12 01:55:38 INFO mapred.JobClient:     Map output materialized bytes=85

14/06/12 01:55:38 INFO mapred.JobClient:     Map input records=2

14/06/12 01:55:38 INFO mapred.JobClient:     Reduce shuffle bytes=85

14/06/12 01:55:38 INFO mapred.JobClient:     Spilled Records=12

14/06/12 01:55:38 INFO mapred.JobClient:     Map output bytes=82

14/06/12 01:55:38 INFO mapred.JobClient:     Total committed heap usage (bytes)=336338944

14/06/12 01:55:38 INFO mapred.JobClient:     CPU time spent (ms)=3010

14/06/12 01:55:38 INFO mapred.JobClient:     Combine input records=8

14/06/12 01:55:38 INFO mapred.JobClient:     SPLIT_RAW_BYTES=222

14/06/12 01:55:38 INFO mapred.JobClient:     Reduce input records=6

14/06/12 01:55:38 INFO mapred.JobClient:     Reduce input groups=5

14/06/12 01:55:38 INFO mapred.JobClient:     Combine output records=6

14/06/12 01:55:38 INFO mapred.JobClient:     Physical memory (bytes) snapshot=394276864

14/06/12 01:55:38 INFO mapred.JobClient:     Reduce output records=5

14/06/12 01:55:38 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2918625280

14/06/12 01:55:38 INFO mapred.JobClient:     Map output records=8

 

查看运行输出目录,有_SUCCESS文件,说明运行成功,证明hadoop集群环境搭建基本没有问题。

hadoop fs -ls output-0

 

Found 3 items

-rw-r--r--   3 hadoop supergroup          0 2014-06-12 01:55 /user/hadoop/output-0/_SUCCESS

drwxr-xr-x   - hadoop supergroup          0 2014-06-12 01:55 /user/hadoop/output-0/_logs

-rw-r--r--   3 hadoop supergroup         41 2014-06-12 01:55 /user/hadoop/output-0/part-r-00000

 

查看运行结果

hadoop fs -cat output-0/part-r-00000

 

Bye  1

Goodbye  1

Hadoop    2

Hello         2

World        2

与预设数据的预期结果一致。

相关内容