spark0.8.1分布式安装


spark安装包:spark-0.8.1-incubating-bin-hadoop2.tgz

操作系统:     CentOS6.4

jdk版本:      jdk1.7.0_21

1. Cluster模式

1.1安装Hadoop

VMware Workstation创建三台CentOS虚拟机,hostname分别设置为 master,slaver01, slaver02,设置SSH无密码登陆,安装hadoop,然后启动hadoop集群。参考我的这篇博客,hadoop-2.2.0分布式安装.

1.2 Scala

在三台机器上都要安装 Scala 2.9.3,按照我的博客SparK安装的步骤。JDK在安装Hadoop时已经安装了。进入master节点。

$ cd
$ scp -r scala-2.10.3 root@slaver01:~
$ scp -r scala-2.10.3 root@slaver02:~

 

1.3master上安装并配置Spark

解压

$ tar -zxf spark-0.8.1-incubating-bin-hadoop2.tgz
$ mv spark-0.8.1-incubating-bin-hadoop2 spark-0.8.1

inconf/spark-env.sh中设置SCALA_HOME

$ cd ~/spark-0.8.1/conf
$ mv spark-env.sh.template spark-env.sh
$ vi spark-env.sh
# add the following line
export SCALA_HOME=/root/scala-2.10.3
export JAVA_HOME=/usr/java/jdk1.7.0_21
# save and exit

conf/slaves,添加Sparkworkerhostname,一行一个。

$ vim slaves
slaver01
slaver02
# save and exit

(可选)设置 SPARK_HOME环境变量,并将SPARK_HOME/bin加入PATH

$ vim /etc/profile
# add the following lines at the end
export SPARK_HOME=$HOME/spark-0.8.1
export PATH=$PATH:$SPARK_HOME/bin
# save and exit vim
#make the bash profile take effect immediately
$ source /etc/profile

1.4在所有worker上安装并配置Spark

既然master上的这个文件件已经配置好了,把它拷贝到所有的worker即可。注意,三台机器spark所在目录必须一致,因为master会登陆到worker上执行命令,master认为workerspark路径与自己一样。

$ cd
$ scp -r spark-0.8.1 root@slaver01:~
$ scp -r spark-0.8.1 root@slaver02:~

1.5启动 Spark 集群

master上执行

$ cd ~/spark-0.8.1
$ bin/start-all.sh

检测进程是否启动

[root@master ~]# jps
9664 Jps
7993 Master
9276 SecondaryNameNode
9108 NameNode
8105 Worker
9416 ResourceManager

浏览masterweb UI(默认http://master:8080).这是你应该可以看到所有的work节点,以及他们的CPU个数和内存等信息。

1.6运行Spark自带的例子

运行SparkPi

$ cd ~/spark-0.8.1
[root@master ~]# cd ~/spark-0.8.1
[root@master spark-0.8.1]# ./run-example org.apache.spark.examples.SparkPi spark://master:7077
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Pi is roughly 3.14236
[root@master spark-0.8.1]#
 

运行SparkLR

#Logistic Regression

#./run-example org.apache.spark.examples.SparkLR spark://master:7077
[root@master spark-0.8.1]# ./run-example org.apache.spark.examples.SparkLR spark://master:7077
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Initial w: (-0.8066603352924779, -0.5488747509304204, -0.7351625370864459, 0.8228539509375878, -0.6662446067860872, -0.33245457898921527, 0.9664202269036932, -0.20407887461434115, 0.4120993933386614, -0.8125908063470539)
On iteration 1
On iteration 2
On iteration 3
On iteration 4
On iteration 5
Final w: (5816.075967498865, 5222.008066011391, 5754.751978607454, 3853.1772062206846, 5593.565827145932, 5282.387874201054, 3662.9216051953435, 4890.78210340607, 4223.371512250292, 5767.368579668863)
[root@master spark-0.8.1]#

运行 SparkKMeans

#kmeans
$ ./run-example org.apache.spark.examples.SparkKMeans spark://master:7077 ./kmeans_data.txt 2 1
[root@master spark-0.8.1]# ./run-example org.apache.spark.examples.SparkKMeans spark://master:7077 ./kmeans_data.txt 2 1
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Final centers:
(0.1, 0.1, 0.1)
(9.2, 9.2, 9.2)
[root@master spark-0.8.1]#

1.7HDFS读取文件并运行WordCount

$ cd ~/spark-0.8.1
$ wget http://www.gutenberg.org/cache/epub/20417/pg20417.txt
$ hadoop fs -put pg20417.txt ./
$ MASTER=spark://master:7077 ./spark-shell
[root@master spark-0.8.1]# MASTER=spark://master:7077 ./spark-shell
Welcome to
      ____              __  
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 0.8.1
      /_/                  
 
Using Scala version 2.9.3 (Java HotSpot(TM) Client VM, Java 1.7.0_21)
Initializing interpreter...
log4j:WARN No appenders could be found for logger (org.eclipse.jetty.util.log).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Creating SparkContext...
Spark context available as sc.
Type in expressions to have them evaluated.
Type :help for more information.
scala> val file = sc.textFile("hdfs://master:9000/user/root/pg20417.txt")
file: org.apache.spark.rdd.RDD[String] = MappedRDD[9] at textFile at <console>:12
scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
count: org.apache.spark.rdd.RDD[(java.lang.String, Int)] = MapPartitionsRDD[14] at reduceByKey at <console>:14scala> count.collect()

1.8停止 Spark 集群

$ cd ~/spark-0.8.1
$ bin/stop-all.sh

 

 

 

相关内容