Hadoop最大值整数算法详解


环境:

Linux系统CentOS 6.3(64bit)

Hadoop1.1.2

Linux下Eclipse版本

最大值算法代码:

package yunSave;

import java.io.IOException; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.conf.Configured; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.Mapper; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
import org.apache.hadoop.util.Tool; 
import org.apache.hadoop.util.ToolRunner; 
//求最大值 
public class MaxValue extends Configured implements Tool { 
public static class MapClass extends Mapper<LongWritable, Text, IntWritable, IntWritable> { 
private int maxNum = 0; 
public void map(LongWritable key, Text value, Context context) 
      throws IOException, InterruptedException { 
String[] str = value.toString().split(" "); 
try {// 对于非数字字符我们忽略掉
for(int i=0;i<str.length;i++){
int temp = Integer.parseInt(str[i]); 
if (temp > maxNum) { 
maxNum = temp; 
}
}
} catch (NumberFormatException e) { 


@Override 
protected void cleanup(Context context) throws IOException, 
InterruptedException { 
context.write(new IntWritable(maxNum), new IntWritable(maxNum)); 


public static class Reduce extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> { 
private int maxNum = 0; 
public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) 
throws IOException, InterruptedException { 
for (IntWritable val : values) { 
if ( val.get() > maxNum) { 
maxNum = val.get(); 



@Override 
protected void cleanup(Context context) throws IOException, 
InterruptedException { 
context.write(new IntWritable(maxNum), new IntWritable(maxNum)); 


public int run(String[] args) throws Exception { 
Configuration conf = getConf(); 
Job job = new Job(conf, "MaxNum"); 
job.setJarByClass(MaxValue.class); 
FileInputFormat.setInputPaths(job, new Path(args[0])); 
FileOutputFormat.setOutputPath(job, new Path(args[1])); 
job.setMapperClass(MapClass.class); 
job.setCombinerClass(Reduce.class); 
job.setReducerClass(Reduce.class); 
job.setInputFormatClass(TextInputFormat.class); 
job.setOutputFormatClass(TextOutputFormat.class); 
job.setOutputKeyClass(IntWritable.class); 
job.setOutputValueClass(IntWritable.class); 
System.exit(job.waitForCompletion(true) ? 0 : 1); 
return 0; 

public static void main(String[] args) throws Exception { 
long start = System.nanoTime(); 
int res = ToolRunner.run(new Configuration(), new MaxValue(), args); 
System.out.println(System.nanoTime()-start); 
System.exit(res); 

}


输入的文件内容:

[work@master ~]$ hadoop fs -cat input_20141107/555.txt
Warning: $HADOOP_HOME is deprecated.


1 5 10 9999
[work@master ~]$


[work@master ~]$ hadoop fs -cat input_20141107/666.txt
Warning: $HADOOP_HOME is deprecated.


111 222 333 888
[work@master ~]$

Eclipse的执行画面:

1.Argument参数

program Arguments:

hdfs://master:9000/user/work/input_20141107  hdfs://master:9000/user/work/output_20141107

VM Arguments:

-Xms512m -Xmx1024m -XX:MaxPermSize=256m

点击Run之后产生下面的结果

运行结果:

[work@master ~]$ hadoop fs -cat output_20141107/part-r-00000
Warning: $HADOOP_HOME is deprecated.

CentOS安装和配置Hadoop2.2.0 

Ubuntu 13.04上搭建Hadoop环境

Ubuntu 12.10 +Hadoop 1.2.1版本集群配置

Ubuntu上搭建Hadoop环境(单机模式+伪分布模式)

Ubuntu下Hadoop环境的配置

单机版搭建Hadoop环境图文教程详解

搭建Hadoop环境(在Winodws环境下用虚拟机虚拟两个Ubuntu系统进行搭建)

本文永久更新链接地址:

相关内容