[Hive]MapReduce将数据写入Hive分区表,mapreducehive
[Hive]MapReduce将数据写入Hive分区表,mapreducehive
业务需求:
将当天产生的数据写入Hive分区表中(以日期作为分区)
业务分析:
利用MapReduce将数据写入Hive表实则上就是将数据写入至Hive表的HDFS目录下,但是问题在于写入至当天的分区,因此问题转换为:如何事先创建Hive表的当天分区
解决方案:
1. 创建Hive表
# 先创建分区表rcmd_valid_path hive -e "set mapred.job.queue.name=pms; drop table if exists pms.test_rcmd_valid_path; create table if not exists pms.test_rcmd_valid_path ( track_id string, track_time string, session_id string, gu_id string, end_user_id string, page_category_id bigint, algorithm_id int, is_add_cart int, rcmd_product_id bigint, product_id bigint, path_id string, path_type string, path_length int, path_list string, order_code string, groupon_id bigint ) partitioned by (ds string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';"2. 创建表的date当天分区(若分区不存在则创建)
# 创建正式表rcmd_valid_path表date当天的分区目录 hive -e "set mapred.job.queue.name=pms; insert overwrite table pms.test_rcmd_valid_path partition(ds='$date') select track_id, track_time, session_id, gu_id, end_user_id, page_category_id, algorithm_id, is_add_cart, rcmd_product_id, product_id, path_id, path_type, path_length, path_list, order_code, groupon_id from pms.test_rcmd_valid_path where ds = '$date';"3. Job直接写入即可(留意job2OutputPath)
hadoop jar lib/bigdata-datamining-1.1-user-trace-jar-with-dependencies.jar com.yhd.datamining.data.usertrack.offline.job.mapred.TrackPathJob \ --similarBrandPath /user/pms/recsys/algorithm/schedule/warehouse/relation/brand/$yesterday \ --similarCategoryPath /user/pms/recsys/algorithm/schedule/warehouse/relation/category/$yesterday \ --mcSiteCategoryPath /user/hive/warehouse/mc_site_category \ --extractPreprocess /user/hive/warehouse/test_extract_preprocess \ --engineMatchRule /user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday \ --artificialMatchRule /user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday \ --category /user/hive/warehouse/category \ --keywordCategoryTopN 3 \ --termCategory /user/hive/pms/temp_term_category \ --extractGrouponInfo /user/hive/pms/extract_groupon_info \ --extractProductSerial /user/hive/pms/product_serial_id \ --job1OutputPath /user/pms/workspace/ouyangyewei/testUsertrack/job1Output \ --job2OutputPath /user/hive/pms/test_rcmd_valid_path/ds=$date
评论暂时关闭