shark 0.9.1 整理安装笔记
shark 0.9.1 整理安装笔记
一 安装环境:
组件的版本:
hadoop :2.3.0
spark :0.9.0
shark:0.9.1-hadoop2
hive :0.11.0
jdk :orcal HotSpot 1.7.0_55 (ps:特别注意 jdk1.6 报错提示版本不兼容)
系统:Ubuntu 12.04 x86
二 安装过程:
1)hadoop 和spark hive 的安装配置略过。。
2)shark 安装:
- 下载
- 关于JDK 特别注意 jdk1.6 报错提示版本不兼容
- 配置:
<property> <name>yarn.application.classpath</name> <value> $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*,$SHARK_HIVE/hive-hbase-handler/hive-hbase-handler-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-anttasks/hive-anttasks-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-service/hive-service-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-serde/hive-serde-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-metastore/hive-metastore-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-hwi/hive-hwi-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-exec/hive-exec-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-beeline/hive-beeline-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-shims/hive-shims-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-common/hive-common-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-cli/hive-cli-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-jdbc/hive-jdbc-0.11.0-shark-0.9.1.jar </value> </property>其中: $SHARK_HIVE的是添加shark里面的自带的hive包,其他的是拷贝的yarn-default.xml里面的 yarn-env.sh 添加如下内容:
export SHARK_HIVE=$SHARK_HOME/lib_managed/jars/edu.berkeley.cs.shark2 shark 安装包里面的 /lib_managed/jars/edu.berkeley.cs.shark/hive-exec/hive-exec-0.11.0-shark-0.9.1.jar 里面包含的protobuf 会和已有的protobuf*.jar 版本不兼容,所以要手动把hive-exec-0.11.0-shark-0.9.1.jar中com/google/protobuf 下的内容全部删掉,我是直接在windows下用rar工具删除的,也可以使用jar命令解压然后重新打包。
3 修改$SHARK_HOME/run 把shark的spark-assembly.jar 加到classpath里面
if [ -f "$SPARK_JAR" ] ; then SPARK_CLASSPATH+=":$SPARK_JAR" fi
4 修改$SHARK_HOME/conf/shark-env.sh
export SPARK_MEM=1g export JAVA_HOME=/opt/jdk1.6.0_45 # (Required) Set the master program's memory export SHARK_MASTER_MEM=1g # (Optional) Specify the location of Hive's configuration directory. By default, # Shark run scripts will point it to $SHARK_HOME/conf export HIVE_CONF_DIR="$INSTALL_HOME/hive-0.11.0-bin/conf" # For running Shark in distributed mode, set the following: export HADOOP_HOME="$INSTALL_HOME/hadoop-2.2.0" export HIVE_HOME="$INSTALL_HOME/hive-0.11.0-bin" export SPARK_HOME="$INSTALL_HOME/spark-0.9.1-bin-hadoop2" export MASTER="spark://xxxxx:7077" # Only required if using Mesos: #export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so # Only required if run shark with spark on yarn export SHARK_EXEC_MODE=yarn export SPARK_ASSEMBLY_JAR="$INSTALL_HOME/spark-0.9.1-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar" export SHARK_ASSEMBLY_JAR="$INSTALL_HOME/shark-0.9.1-bin-hadoop2/target/scala-2.10/shark_2.10-0.9.1.jar" # (Optional) Extra classpath #export SPARK_LIBRARY_PATH="" # Java options # On EC2, change the local.dir to /mnt/tmp SPARK_JAVA_OPTS=" -Dspark.local.dir=/tmp " SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 " SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps " export SPARK_JAVA_OPTS # (Optional) Tachyon Related Configuration #export TACHYON_MASTER="" # e.g. "localhost:19998" #export TACHYON_WAREHOUSE_PATH=/sharktables # Could be any valid path name
5 把hive 和shark 分发到每一个slaves下面相同的目录
6 另外hive升级的时候我是直接把旧版的的conf文件拿过来了提示错误: ClassNotFoundException org.apache.hadoop.log.metrics.EventCounter 后来直接把11带的log4j拿过来改了一下就能用了
三 使用shark 运行查询 ./bin/shark-withinfo ./bin/shark-withdebug 上面两个命令就是设置hive的日志输出级别不一样
shark> select count(1) from test.item_basic_info where 1=1; 23.959: [GC 272587K->25855K(1005312K), 0.0207550 secs] OK 39793 Time taken: 6.307 seconds shark> select count(1) from test.item_basic_info where 1=1; OK 39793 Time taken: 1.22 seconds好像还挺快的。
四 参考资料: https://github.com/amplab/shark/releases 官网安装介绍:
https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster
问题解决: http://blog.csdn.net/baiyangfu_love/article/details/23769147
评论暂时关闭