Hadoop中自带的examples之wordcount应用案例,hadoopwordcount


大家都知道hadoop中自带了很多例子,那么怎么用呢,今天主要测试下hadoop中的wordcount程序jar包:

1、首先启动hadoop

2、准备数据:vim words, 写入

hello tom

hello jerry

hello kitty 

hello tom

hello bbb

3、将数据上传到HDFS

hadoop fs -put words /user/guest/words.txt

4、运行examples中自带的wordcount程序jar包

guest@master:/usr/hadoop/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.4.0.jar 

An example program must be given as the first argument.

Valid program names are:

  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

  dbcount: An example job that count the pageview counts from a database.

  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

  grep: A map/reduce program that counts the matches of a regex in the input.

  join: A job that effects a join over sorted, equally partitioned datasets

  multifilewc: A job that counts words from several files.

  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

  randomwriter: A map/reduce program that writes 10GB of random data per node.

  secondarysort: An example defining a secondary sort to the reduce.

  sort: A map/reduce program that sorts the data written by the random writer.

  sudoku: A sudoku solver.

  teragen: Generate data for the terasort

  terasort: Run the terasort

  teravalidate: Checking results of terasort

  wordcount: A map/reduce program that counts the words in the input files.

  wordmean: A map/reduce program that counts the average length of the words in the input files.

  wordmedian: A map/reduce program that counts the median length of the words in the input files.

  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

从这里可以看到wordcount程序;然后执行:

hadoop jar hadoop-mapreduce-examples-2.4.0.jar wordcount /user/guest/words.txt /user/guest/wordcount

查看结果:hello 5 Jerry 1 kitty 1 tom 2 bbb 1

版权声明:本文为博主原创文章,未经博主允许不得转载。

相关内容