【甘道夫】Hive 0.13.1 on Hadoop2.2.0 + Oracle10g部署详解，0.13.1oracle10g

文章由LinuxBoy分享于2019-03-27 05:03:40热评（148）

【甘道夫】Hive 0.13.1 on Hadoop2.2.0 + Oracle10g部署详解，0.13.1oracle10g

环境： hadoop2.2.0 hive0.13.1 Ubuntu 14.04 LTS java version "1.7.0_60"
Oracle10g

***欢迎转载，请注明来源***
http://blog.csdn.net/u010967382/article/details/38709751

到以下地址下载安装包 http://mirrors.cnnic.cn/apache/hive/stable/apache-hive-0.13.1-bin.tar.gz

安装包解压到服务器上 /home/fulong/Hive/apache-hive-0.13.1-bin

修改环境变量，添加以下内容 export HIVE_HOME=/home/fulong/Hive/apache-hive-0.13.1-bin export PATH=$HIVE_HOME/bin:$PATH

进到conf目录下拷贝模板配置文件重命名 fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ ls hive-default.xml.template hive-exec-log4j.properties.template hive-env.sh.template hive-log4j.properties.template fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-env.sh.template hive-env.sh fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-default.xml.template hive-site.xml fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ ls hive-default.xml.template hive-env.sh.template hive-log4j.properties.template hive-env.sh hive-exec-log4j.properties.template hive-site.xml

修改配置文件hive-env.sh中的以下几处，分别制定Hadoop的根目录，Hive的conf和lib目录 # Set HADOOP_HOME to point to a specific hadoop install directory HADOOP_HOME=/home/fulong/Hadoop/hadoop-2.2.0
# Hive Configuration Directory can be controlled by: export HIVE_CONF_DIR=/home/fulong/Hive/apache-hive-0.13.1-bin/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by: export HIVE_AUX_JARS_PATH=/home/fulong/Hive/apache-hive-0.13.1-bin/lib

修改配置文件hive-site.sh中的以下几处连接Oracle相关参数 <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:oracle:thin:@192.168.0.138:1521:orcl</value> <description>JDBC connect string for a JDBC metastore</description> </property>
<property> <name>javax.jdo.option.ConnectionDriverName</name> <value>oracle.jdbc.driver.OracleDriver</value> <description>Driver class name for a JDBC metastore</description> </property>
<property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property>
<property> <name>javax.jdo.option.ConnectionPassword</name> <value>hivefbi</value> <description>password to use against metastore database</description> </property>

配置log4j 在$HIVE_HOME下创建log4j目录，用于存储日志文件拷贝模板重命名 fulong@FBI006:~/Hive/apache-hive-0.13.1-bin/conf$ cp hive-log4j.properties.template hive-log4j.properties
修改存放日志的目录 hive.log.dir=/home/fulong/Hive/apache-hive-0.13.1-bin/log4j

拷贝Oracle JDBC的jar包 将对应Oracle的jdbc包拷贝到$HIVE_HOME/lib下

启动Hive fulong@FBI006:~/Hive/apache-hive-0.13.1-bin$ hive 14/08/20 17:14:05 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/08/20 17:14:05 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/08/20 17:14:05 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/08/20 17:14:05 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/08/20 17:14:05 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/08/20 17:14:05 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/08/20 17:14:05 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in file:/home/fulong/Hive/apache-hive-0.13.1-bin/conf/hive-log4j.properties Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/fulong/Hadoop/hadoop-2.2.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. hive>

验证打算创建一张表存储搜狗实验室下载的用户搜索行为日志。数据下载地址： http://www.sogou.com/labs/dl/q.html

首先创建表： hive> create table searchlog (time string,id string,sword string,rank int,clickrank int,url string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;
此时会报错： FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : ORA-01754: a table may contain only one column of type LONG
解决办法：用解压缩工具打开${HIVE_HOME}/lib中的hive-metastore-0.13.0.jar，发现名为package.jdo的文件，打开该文件并找到下面的内容。 <field name="viewOriginalText" default-fetch-group="false"> <column name="VIEW_ORIGINAL_TEXT" jdbc-type="LONGVARCHAR"/> </field> <field name="viewExpandedText" default-fetch-group="false"> <column name="VIEW_EXPANDED_TEXT" jdbc-type="LONGVARCHAR"/> </field> 可以发现列VIEW_ORIGINAL_TEXT和VIEW_EXPANDED_TEXT的类型都为LONGVARCHAR，对应于Oracle中的LONG，这样就与Oracle表只能存在一列类型为LONG的列的要求相矛盾，所以就出现错误了。
按照Hive官网的建议将该两列的jdbc-type的值改为CLOB，修改后的内容如下所示。 <field name="viewOriginalText"default-fetch-group="false"> <column name="VIEW_ORIGINAL_TEXT" jdbc-type="CLOB"/> </field> <field name="viewExpandedText"default-fetch-group="false"> <column name="VIEW_EXPANDED_TEXT" jdbc-type="CLOB"/> </field>
修改以后，重启hive。
重新执行创建表的命令，创建表成功：
hive> create table searchlog (time string,id string,sword string,rank int,clickrank int,url string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile; OK Time taken: 0.986 seconds
将本地数据加载进表中： hive> load data local inpath '/home/fulong/Downloads/SogouQ.reduced' overwrite into table searchlog; Copying data from file:/home/fulong/Downloads/SogouQ.reduced Copying file: file:/home/fulong/Downloads/SogouQ.reduced Loading data to table default.searchlog rmr: DEPRECATED: Please use 'rm -r' instead. Deleted hdfs://fulonghadoop/user/hive/warehouse/searchlog Table default.searchlog stats: [numFiles=1, numRows=0, totalSize=152006060, rawDataSize=0] OK Time taken: 25.705 seconds
查看所有表： hive> show tables; OK searchlog Time taken: 0.139 seconds, Fetched: 1 row(s)
统计行数： hive> select count(*) from searchlog; Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1407233914535_0001, Tracking URL = http://FBI003:8088/proxy/application_1407233914535_0001/ Kill Command = /home/fulong/Hadoop/hadoop-2.2.0/bin/hadoop job -kill job_1407233914535_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-08-20 18:03:17,667 Stage-1 map = 0%, reduce = 0% 2014-08-20 18:04:05,426 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.46 sec 2014-08-20 18:04:27,317 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.74 sec MapReduce Total cumulative CPU time: 4 seconds 740 msec Ended Job = job_1407233914535_0001 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.74 sec HDFS Read: 152010455 HDFS Write: 8 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 740 msec OK 1724264 Time taken: 103.154 seconds, Fetched: 1 row(s)

推荐文章：

【甘道夫】Hive 0.13.1 on Hadoop2.2.0 + Oracle10g部署详解，0.13.1oracle10g