Hive之简单查询不启用MapReduce，hive启用mapreduce

文章由LinuxBoy分享于2019-03-27 10:03:23热评（69）

Hive之简单查询不启用MapReduce，hive启用mapreduce

如果你想查询某个表的某一列，Hive默认是会启用MapReduce Job来完成这个任务，如下：

`01`	`hive> SELECT id, money FROM m limit` `10;`

`02`	`Total MapReduce jobs =` `1`

`03`	`Launching Job` `1` `out of` `1`

`04`	`Number of reduce tasks is set to` `0` `since there's no reduce operator`

`05`	`Cannot run job locally: Input Size (=` `235105473) is larger than`

`06`	`hive.exec.mode.local.auto.inputbytes.max (=` `134217728)`

`07`	`Starting Job = job_1384246387966_0229, Tracking URL =`

08

`09`	`http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/`

10

`11`	`Kill Command = /home/q/hadoop-2.2.0/bin/hadoop job`

`12`	`-kill job_1384246387966_0229`

`13`	`hadoop job information` `forStage-1: number of mappers:` `1;`

`14`	`number of reducers:` `0`

`15`	`2013-11-13` `11:35:16,167` `Stage-1` `map =` `0%, reduce =` `0%`

`16`	`2013-11-13` `11:35:21,327` `Stage-1` `map =` `100%, reduce =` `0%,`

`17`	`Cumulative CPU` `1.26` `sec`

`18`	`2013-11-13` `11:35:22,377` `Stage-1` `map =` `100%, reduce =` `0%,`

`19`	`Cumulative CPU` `1.26` `sec`

`20`	`MapReduce Total cumulative CPU time:` `1` `seconds` `260` `msec`

`21`	`Ended Job = job_1384246387966_0229`

`22`	`MapReduce Jobs Launched:`

`23`	`Job` `0: Map:` `1` `Cumulative CPU:` `1.26sec`

`24`	`HDFS Read:` `8388865` `HDFS Write:` `60` `SUCCESS`

`25`	`Total MapReduce CPU Time Spent:` `1` `seconds` `260` `msec`

26 OK

27 1 122

28 1 185

29 1 231

30 1 292

31 1 316

32 1 329

33 1 355

34 1 356

35 1 362

36 1 364

`37`	`Time taken:` `16.802` `seconds, Fetched:` `10` `row(s)`

　　我们都知道，启用MapReduce Job是会消耗系统开销的。对于这个问题，从Hive0.10.0版本开始，对于简单的不需要聚合的类似SELECT <col> from <table> LIMIT n语句，不需要起MapReduce job，直接通过Fetch task获取数据，可以通过下面几种方法实现：
　　方法一：

`01`	`hive> set hive.fetch.task.conversion=more;`

`02`	`hive> SELECT id, money FROM m limit` `10;`

03 OK

04 1 122

05 1 185

06 1 231

07 1 292

08 1 316

09 1 329

10 1 355

11 1 356

12 1 362

13 1 364

`14`	`Time taken:` `0.138` `seconds, Fetched:` `10` `row(s)`

上面 set hive.fetch.task.conversion=more;开启了Fetch任务，所以对于上述简单的列查询不在启用MapReduce job！
　　方法二：

`1`	`bin/hive --hiveconf hive.fetch.task.conversion=more`

　　方法三：
上面的两种方法都可以开启了Fetch任务，但是都是临时起作用的；如果你想一直启用这个功能，可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置：

`01`	`<property>`

`02`	`<name>hive.fetch.task.conversion</name>`

`03`	`<value>more</value>`

`04`	`<description>`

`05`	`Some select queries can be converted to single FETCH task`

`06`	`minimizing latency.Currently the query should be single`

`07`	`sourced not having any subquery and should not have`

`08`	`any aggregations or distincts (which incurrs RS),`

`09`	`lateral views and joins.`

`10`	`1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only`

`11`	`2. more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)`

`12`	`</description>`

`13`	`</property>`

这样就可以长期启用Fetch任务了，很不错吧，也赶紧去试试吧！

推荐文章：

Hive之简单查询不启用MapReduce，hive启用mapreduce