spark0.9.1集群模式运行graphx测试程序(LiveJournalPageRank,新增Connected Components)
spark0.9.1集群模式运行graphx测试程序(LiveJournalPageRank,新增Connected Components)
spark最新版发布了,之前的版本就已经集成了graphx,这个版本还改了一些bug。我做了简单测试,不过网上关于集群模式运行spark资料太少了,只有关于EC2(见参考资料1)的,但是还很旧,好多命令都有变化了。很讨厌写安装类的博客不注明当前使用软件的版本,这是常识好不好?!
我的平台配置:
spark:0.9.1
scala:2.10.4
hadoop:1.0.4
jdk:1.7.0
master node:1
worker node:16
1. spark 0\.9\.1的部署
参见之前的博客2. 下载graphx的测试程序输入集(点击下载:soc-LiveJournal1.txt.gz)
如果失效可以留言跟我要。
3. 运行graphx测试程序pagerank
./bin/run-example org.apache.spark.examples.graphx.LiveJournalPageRank spark://$MASTERIP:7077 hdfs://$HDFSIP:9000/soc-LiveJournal1.txt --numEPart=192 --output=pagerank_out参数解释,自己看吧: Usage: LiveJournalPageRank <master> <edge_list_file>
[--tol=<tolerance>]
The tolerance allowed at convergence (smaller => more accurate). Default is 0.001.
[--output=<output_file>]
If specified, the file to write the ranks to.
[--numEPart=<num_edge_partitions>]
The number of partitions for the graph's edge RDD. Default is 4.
[--partStrategy=RandomVertexCut | EdgePartition1D | EdgePartition2D | CanonicalRandomVertexCut]
The way edges are assigned to edge partitions. Default is RandomVertexCut.
4. 运行graphx测试程序Connected Components
该benchmark输入和pagerank可以一样,运行命令如下: ./bin/run-example org.apache.spark.graphx.lib.Analytics spark://$MASTERIP:7077 cc hdfs://$HDFSIP:8020/soc-LiveJournal1.txt --numIter=20 -numEPart=192参考资料: 1. https://github.com/amplab/graphx/wiki/Launch-a-benchmarking-cluster 2. http://blog.csdn.net/qianlong4526888/article/details/21441131 3. http://spark.apache.org/docs/latest/graphx-programming-guide.html#pagerank
评论暂时关闭