hadoop2.0 新一代map reduce 框架 yarn 配置



以前一直用的0.20的map reduce框架,今天配置一下yarn,很久不写 BO-KE 了,来一篇吧,把几个主要配置文件贴出来,配置修改后,运行wordcount和自己的测试job全ok,

core-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>
    <property> 
        <name>fs.defaultFS</name> 
        <value>hdfs://fc20:9000</value> 
    </property> 


    <property> 
        <name>hadoop.tmp.dir</name> 
        <value>/home/ljq/hadoop/tmp</value>
    </property> 


    <property>
        <name>hadoop.native.lib</name>
        <value>false</value>
        <description>Should native hadoop libraries, if present, be used.</description>
    </property>
</configuration>


hdfs-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>
    <property> 
        <name>dfs.replication</name> 
        <value>1</value> 
    </property> 


    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/ljq/hadoop/dfs/name</value>
    </property>


    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/ljq/hadoop/dfs/data</value>
    </property>
</configuration>

mapred-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>  一定是小写yarn否则出现错误:java.lang.IllegalStateException: Invalid shuffle port number -1 
    </property>

   <property>
        <name>mapreduce.jobhistory.address</name> jobhistory 的web地址,需要手动启动
        <value>fc20:10020</value>
    </property>

    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>fc20:10021</value>
    </property>


</configuration>


yarn-site.xml: 文件里所有yarn都必须小写,否则相应的端口信息找不到,会在默认端口上启动进程

<?xml version="1.0"?>
<configuration>


<!-- Site specific YARN configuration properties -->


    <property>
        <description>The hostname of the RM.</description> 
        <name>yarn.resourcemanager.hostname</name> 
        <value>fc20</value> 
    </property>

    <property> 
        <name>yarn.nodemanager.aux-services</name> 
        <value>mapreduce.shuffle</value> 
    </property>
 
    <property> 
        <description>The address of the applications manager interface in the RM.</description> 
        <name>yarn.resourcemanager.address</name> 
        <value>fc20:18004</value> 
    </property> 

    <property> 
        <description>The address of the scheduler interface.</description> 
        <name>yarn.resourcemanager.scheduler.address</name> 
        <value>fc20:18003</value> 
    </property> 

    <property> 
        <description>The address of the RM web application.</description> 
        <name>yarn.resourcemanager.webapp.address</name> 
        <value>fc20:18008</value> 
    </property> 
  
    <property> 
        <description>The address of the resource tracker interface.</description> 
        <name>yarn.resourcemanager.resource-tracker.address</name> 
        <value>fc20:18006</value> 
    </property> 


</configuration>

通过netstat可以看见相应端口已经启动

运行wordcount以及自己写的mapreduce可以成功完成!

另外jobhistory 不是随着hdfs和yarn的启动自动启动,而是需要手动启动,曾经为这个纳闷了好几天,
启动jobhistory进程:

$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start/stop historyserver


相关内容