Hive-On-Tez性能测试,hiveontez



Table of Contents

Hive-On-Tez测试
MRR计算模型测试
MPJ计算模型测试

Hive-On-Tez测试

在MRR和MPJ计算模型的处理上,TEZ能够提升的性能较为明显,具体测试如下:

MRR计算模型测试

  • 测试表格

    1.users(id,name,password): 数据总量1千万条记录;

    2.peoples(id,name,gender,address): 数据总量1千万条记录;

    3.gender_summary(gender,count)

    3.address_summary(address,count)

  • 测试语句

    FROM (SELECT u.username, p.sex, p.address FROM users u 
        JOIN peoples p ON u.userid = p.id) 
    subql
    INSERT OVERWRITE TABLE gender_summary 
        SELECT subql.sex, count(*) GROUP BY subql.sex
    INSERT OVERWRITE TABLE address_summary 
        SELECT subql.address, count(*) GROUP BY subql.address;
    					
  • DAG有向无环图如下:


  • 执行结果

    1. 基于MapReduce运行

      MapReduce Jobs Launched: 
      Stage-Stage-2: Map: 2  Reduce: 3   Cumulative CPU: 220.78 sec
      Stage-Stage-3: Map: 1  Reduce: 1   Cumulative CPU: 4.23 sec
      Stage-Stage-4: Map: 1  Reduce: 1   Cumulative CPU: 4.08 sec
      Total MapReduce CPU Time Spent: 3 minutes 49 seconds 90 msec
      Time taken: 186.853 seconds
      3次执行分别用时:186.853、188.748、191.812,平均用时:189.13秒。
      							
    2. 基于TEZ运行

      --------------------------------------------------------------------------------
              VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
      --------------------------------------------------------------------------------
      Map 1 ..........   SUCCEEDED      5          5        0        0       0       0
      Map 5 ..........   SUCCEEDED      6          6        0        0       0       0
      Reducer 2 ......   SUCCEEDED      2          2        0        0       0       0
      Reducer 3 ......   SUCCEEDED      1          1        0        0       0       0
      Reducer 4 ......   SUCCEEDED      1          1        0        0       0       0
      --------------------------------------------------------------------------------
      VERTICES: 05/05  [==========================>>] 100%  ELAPSED TIME: 56.23 s    
      --------------------------------------------------------------------------------
      Time taken: 60.348 seconds
      3次执行分别用时:60.348、60.441、61.311,平均用时:60.7秒。
      							

    时间效率上提升了近3倍左右。

MPJ计算模型测试

  • 测试表格

    1.users(id,name,password): 数据总量1千万条记录;

    2.peoples(id,name,gender,address): 数据总量1千万条记录;

    3.permissions(userid,name)

  • 测试语句

    SELECT u.userid, p.name, q.name FROM users u 
        JOIN peoples p ON u.userid = p.id 
        JOIN permissions q ON p.id = q.userId;
    					
  • DAG有向无环图如下:


  • 执行结果

    1. 基于MapReduce运行

      MapReduce Jobs Launched: 
      Stage-Stage-1: Map: 3  Reduce: 3   Cumulative CPU: 177.33 sec
      Total MapReduce CPU Time Spent: 2 minutes 57 seconds 330 msec
      OK
      Time taken: 104.208 seconds, Fetched: 5 row(s)
      3次执行分别用时:104.208、102.146、103.537。平均用时:103.297秒。
      							
    2. 基于TEZ运行

      --------------------------------------------------------------------------------
              VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
      --------------------------------------------------------------------------------
      Map 1 ..........   SUCCEEDED      5          5        0        0       0       0
      Map 3 ..........   SUCCEEDED      6          6        0        0       0       0
      Map 4 ..........   SUCCEEDED      1          1        0        0       0       0
      Reducer 2 ......   SUCCEEDED      2          2        0        0       0       0
      --------------------------------------------------------------------------------
      VERTICES: 04/04  [==========================>>] 100%  ELAPSED TIME: 47.50 s    
      --------------------------------------------------------------------------------
      OK
      Time taken: 49.143 seconds, Fetched: 5 row(s)
      3次执行分别用时:49.143、47.284、48.578。平均用时:48.335秒。
      							

    时间效率上提升了2倍多。

版权声明:本文为博主原创文章,未经博主允许不得转载。

相关内容