Hadoop2.2.0+Spark0.9.0分布式搭建



软件版本

JDK:jdk-7u45-linux-x64.tar

Spark:spark-0.9.0-incubating-bin-hadoop2.tgz

Scala:scala-2.10.3.tgz

Hadoop:hadoop-2.2.0_x64.tar.gz

集群状况

adai1: Master/NameNode/ResourceManager/SecondaryNameNode

adai2: Worker/DataNode/NodeManager

adai3: Worker/DataNode/NodeManager

JDK安装

解压缩:

tar -jdk-7u45-linux-x64.tar

将文件夹移动到/usr/lib文件夹下

sudo mv jdk1.7.0_45 /usr/lib/

设置环境变量

sudo vi /etc/profile

在最后添加上

#set java environment

export JAVA_HOME=/usr/lib/jdk1.7.0_45

export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export PATH=$PATH:$JAVA_HOME/bin

export JRE_HOME=$JAVA_HOME/jre

让环境变量生效

source /etc/profile

查看版本信息

java –version

Hosts设置

sudo vi /etc/hosts

127.0.0.1 localhost

192.168.1.11 adai1

192.168.1.12 adai2

192.168.1.13 adai3

将hosts文件复制到其他节点上

scp /etc/hosts adai@192.168.1.12:/etc/hosts

scp /etc/hosts adai@192.168.1.13:/etc/hosts

SSH无密码登录

adai1机无密码登录adai2机 在adai1机上

sudo get-apt install ssh

ssh-keygen -t rsa   (用rsa生成密钥)

cd ~/.ssh     (进入用户目录下的隐藏文件.ssh)

cat id_rsa.pub >> authorized_keys (将id_rsa.pub加到授权的key里面去,这步执行完,应该sshlocalhost可以无密码登录本机了,可能第一次要密码)

scp ~/.ssh/id_rsa.pub adai@adai2:~/ (把adai1机下的id_rsa.pub复制到adai2机下)

adai2机上

cat ~/id_rsa.pub >> ~/.ssh/authorized_keys (adai2机把从adai1机复制的id_rsa.pub添加到.ssh/authorzied_keys文件里)

chmod 600 .ssh/authorized_keys (此处权限必须为600)

再配置adai1机无密码登录adai3机

Hadoop配置

hadoop2.2.0 64位机器版本需要自己编译,网上可以下载到其他人编译好的

解压缩

tar -zxvf hadoop-2.2.0_x64.tar.gz

移动文件夹到/opt/目录下

sudo mv hadoop-2.2.0/ /opt/

设置环境变量

sudo vi /etc/profile

添加

export HADOOP_HOME=/opt/hadoop-2.2.0

export PATH=$PATH:$HADOOP_HOME/bin

export YARN_HOME=/opt/hadoop-2.2.0

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

让环境变量生效

source /etc/profile

进入/opt/hadoop-2.2.0/etc/hadoop/目录,修改配置文件

vihadoop-env.sh

export JAVA_HOME=/usr/lib/jdk1.7.0_45

export HADOOP_HOME=/opt/hadoop-2.2.0

export PATH=$PATH:$HADOOP_HOME/bin

vicore-site.xml

<configuration>

       <property>

               <name>fs.defaultFS</name>

               <value>hdfs://adai1:9000</value>

       </property>

       <property>

               <name>io.file.buffer.size</name>

               <value>131072</value>

       </property>

       <property>

               <name>hadoop.tmp.dir</name>

               <value>file:/opt/hadoop-2.2.0/tmp_hadoop</value>

                <description>Abase forother temporary directories.</description>

       </property>

       <property>

                <name>hadoop.proxyuser.adai.hosts</name>

                <value>*</value>

       </property>

       <property>

               <name>hadoop.proxyuser.adai.groups</name>

                <value>*</value>

       </property>

</configuration>

将mapred-site.xml.templat模板文件重命名

mv mapred-site.xml.templat mapred-site.xml

vi mapred-site.xml

<configuration>

       <property>

               <name>mapreduce.framework.name</name>

                <value>yarn</value>

       </property>

       <property>

                <name>mapreduce.jobhistory.address</name>

               <value>adai1:10020</value>

       </property>

       <property>

               <name>mapreduce.jobhistory.webapp.address</name>

               <value>adai1:19888</value>

       </property>

</configuration>

vi hdfs-site.xml

<configuration>

       <property>

               <name>dfs.namenode.secondary.http-address</name>

               <value>adai1:9001</value>

       </property>

       <property>

               <name>dfs.namenode.name.dir</name>

               <value>file:/opt/hadoop-2.2.0/dfs/name</value>

       </property>

       <property>

               <name>dfs.datanode.data.dir</name>

               <value>file:/opt/hadoop-2.2.0/dfs/data</value>

       </property>

       <property>

               <name>dfs.replication</name>

                <value>2</value>

       </property>

       <property>

               <name>dfs.webhdfs.enabled</name>

                <value>true</value>

       </property>

</configuration>

vi yarn-site.xml

<configuration>

       <property>

               <name>yarn.nodemanager.aux-services</name>

               <value>mapreduce_shuffle</value>

       </property>

       <property>

               <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

               <value>org.apache.hadoop.mapred.ShuffleHandler</value>

       </property>

       <property>

               <name>yarn.resourcemanager.address</name>

               <value>adai1:8032</value>

       </property>

       <property>

               <name>yarn.resourcemanager.scheduler.address</name>

               <value>adai1:8030</value>

       </property>

       <property>

               <name>yarn.resourcemanager.resource-tracker.address</name>

                <value>adai1:8031</value>

       </property>

       <property>

               <name>yarn.resourcemanager.admin.address</name>

               <value>adai1:8033</value>

       </property>

       <property>

               <name>yarn.resourcemanager.webapp.address</name>

               <value>adai1:8088</value>

       </property>

</configuration>

vi slaves

adai2

adai3

将配置文件拷贝到其他节点上

格式化

bin/hadoop namenode –format

启动hadoop

sbin/start-all.sh

jps查看集群情况

Scala安装

解压缩

tar -zxvf scala-2.10.3.tgz

移动到/usr/lib文件夹下

sudo mv scala-2.10.3 /usr/lib/

设置环境变量

sudo vi /etc/profile

在后面增加内容

export SCALA_HOME=/usr/lib/scala-2.10.3

export PATH=$PATH:$SCALA_HOME/bin

配置文件拷贝到其他节点上,让环境变量生效

source /etc/profile

查看版本信息

scala –version

Spark配置

解压缩

tar –zxvf spark-0.9.0-incubating-bin-hadoop2.tgz

移动到opt文件夹下

sudo mv spark-0.9.0-incubating-bin-hadoop2/ /opt/spark

修改环境变量

vi /etc/profile

后面增加内容

export SPARK_HOME=/opt/spark 

export PATH=$PATH:$SPARK_HOME/bin

让环境变量生效

source /etc/profile

到conf文件夹下修改配置文件

mv spark-env.sh.templatespark-env.sh

vi spark-env.sh

export SCALA_HOME=/usr/lib/scala-2.10.3

export JAVA_HOME=/usr/lib/jdk1.7.0_45

export SPARK_MASTER_IP=192.168.1.11

export HADOOP_HOME=/opt/hadoop-2.2.0

export SPARK_HOME=/opt/spark

export SPARK_LIBRARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop/

vi slaves

adai2

adai3

将配置文件拷贝到其他节点上

在Master上执行

sbin/start-all.sh

运行Spark自带的例子

./bin/run-exampleorg.apache.spark.examples.SparkPi spark://192.168.1.11:7077


相关内容