hadoop put copyFromLocal性能比较


hadoop1.0.3

今天测试了下hadoop的shell命令中的put和copyFromLocal命令。在测试之前首先网上搜索了下,找到这篇文章:http://hakunamapdata.com/why-put-is-better-than-copyfromlocal-when-coping-files-to-hdfs/ ,上面说put is better than copyFromLocal,然后上面讲了好多,结果最后的结论只是put比copyFromLocal短,写起来方便。我晕。

同时上面也说明了copyFromLocal和put的关系,从代码上面来看copyFromLocal是继承put的,代码如下:

 public static class CopyFromLocal extends Put {
    public static final String NAME = "copyFromLocal";
    public static final String USAGE = Put.USAGE;
    public static final String DESCRIPTION = "Identical to the -put command.";
  }

只是定义了三个变量,然后就没有内容了。

不过接下来,我自己实际的测试后,发现copyFromLocal性能比put性能好。

测试使用数据为:91.45 MB。

测试后看到log信息如下,下面是截图:


然后是日志(非图片版):

2014-03-12 16:27:26,829 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 140 Total time for transactions(ms): 4 Number of transactions batched in Syncs: 1 Number of syncs: 91 SyncTimes(ms): 2002 
2014-03-12 16:27:26,892 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/hadoop/multi/input1kk01.log. blk_-4017144527627647037_6465
2014-03-12 16:27:29,391 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 172.20.42.151:50010 is added to blk_-4017144527627647037_6465 size 67108864
2014-03-12 16:27:29,392 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 172.20.42.173:50010 is added to blk_-4017144527627647037_6465 size 67108864
2014-03-12 16:27:29,392 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/hadoop/multi/input1kk01.log. blk_1917836084202604773_6465
2014-03-12 16:27:32,547 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 172.20.42.151:50010 is added to blk_1917836084202604773_6465 size 28780032
2014-03-12 16:27:32,548 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 172.20.42.173:50010 is added to blk_1917836084202604773_6465 size 28780032
2014-03-12 16:27:33,408 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on  /user/hadoop/multi/input1kk01.log from client DFSClient_NONMAPREDUCE_762983473_1
2014-03-12 16:27:33,408 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /user/hadoop/multi/input1kk01.log is closed by DFSClient_NONMAPREDUCE_762983473_1
2014-03-12 16:29:21,571 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 145 Total time for transactions(ms): 4 Number of transactions batched in Syncs: 2 Number of syncs: 93 SyncTimes(ms): 2004 
2014-03-12 16:29:21,623 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/hadoop/multi/input1kk.log. blk_-3650339820354531238_6466
2014-03-12 16:29:24,500 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 172.20.42.151:50010 is added to blk_-3650339820354531238_6466 size 67108864
2014-03-12 16:29:24,501 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/hadoop/multi/input1kk.log. blk_2819821776443202118_6466
2014-03-12 16:29:24,502 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 172.20.42.173:50010 is added to blk_-3650339820354531238_6466 size 67108864
2014-03-12 16:29:25,793 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 172.20.42.151:50010 is added to blk_2819821776443202118_6466 size 28780032
2014-03-12 16:29:25,801 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 172.20.42.173:50010 is added to blk_2819821776443202118_6466 size 28780032
2014-03-12 16:29:25,801 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on  /user/hadoop/multi/input1kk.log from client DFSClient_NONMAPREDUCE_-1410189759_1
2014-03-12 16:29:25,801 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /user/hadoop/multi/input1kk.log is closed by DFSClient_NONMAPREDUCE_-1410189759_1

-------------------------------------------------------------------------------------------上面是namenode log,下面是datanode log
2014-03-12 16:27:26,960 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving blk_-4017144527627647037_6465 src: /172.20.43.135:52693 dest: /172.20.42.173:50010
2014-03-12 16:27:29,388 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.20.43.135:52693, dest: /172.20.42.173:50010, bytes: 67108864, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_762983473_1, offset: 0, srvID: DS-646906005-172.20.42.173-50010-1393847049195, blockid: blk_-4017144527627647037_6465, duration: 2425319001
2014-03-12 16:27:29,388 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for blk_-4017144527627647037_6465 terminating
2014-03-12 16:27:29,391 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving blk_1917836084202604773_6465 src: /172.20.43.135:52694 dest: /172.20.42.173:50010
2014-03-12 16:27:32,543 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.20.43.135:52694, dest: /172.20.42.173:50010, bytes: 28780032, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_762983473_1, offset: 0, srvID: DS-646906005-172.20.42.173-50010-1393847049195, blockid: blk_1917836084202604773_6465, duration: 3149219756
2014-03-12 16:27:32,543 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for blk_1917836084202604773_6465 terminating
2014-03-12 16:29:21,634 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving blk_-3650339820354531238_6466 src: /172.20.43.135:52708 dest: /172.20.42.173:50010
2014-03-12 16:29:24,488 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.20.43.135:52708, dest: /172.20.42.173:50010, bytes: 67108864, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1410189759_1, offset: 0, srvID: DS-646906005-172.20.42.173-50010-1393847049195, blockid: blk_-3650339820354531238_6466, duration: 2850592436
2014-03-12 16:29:24,488 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for blk_-3650339820354531238_6466 terminating
2014-03-12 16:29:24,500 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving blk_2819821776443202118_6466 src: /172.20.43.135:52709 dest: /172.20.42.173:50010
2014-03-12 16:29:25,797 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /172.20.43.135:52709, dest: /172.20.42.173:50010, bytes: 28780032, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1410189759_1, offset: 0, srvID: DS-646906005-172.20.42.173-50010-1393847049195, blockid: blk_2819821776443202118_6466, duration: 1294984200
2014-03-12 16:29:25,797 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for blk_2819821776443202118_6466 terminating


分为两个部分,上面一部分是namenode的log,下面是datanode的log。首先上传使用的是put,然后使用的是copyFromLocal。可以看到第一次上传使用了大概7s左右,而第二次上传则使用了大概4s左右。从这次测试来说,copyFromLocal性能是比较好的。但是如果要确定是copyFromLocal性能比put性能好的话,应该要控制变量,再做测试。(可能我做测试的时候,有些其他网络影响)。不过如果单单从源码实现的角度考虑的话,这两个命令性能应该是一样的。


分享,成长,快乐

转载请注明blog地址:http://blog.csdn.net/fansy1990



相关内容

    暂无相关文章