hadoop实验:求气象数据的最低温度,hadoop最低温度
hadoop实验:求气象数据的最低温度,hadoop最低温度
1.下载部分数据,因为实验就只下载2003年的部分气象数据
2.通过zcat *gz > sample.txt命令解压重定向
[hadoop@Master test_data]$ zcat *gz > /home/hadoop/input/sample.txt
3.查看数据格式
4.把文件sample.txt放进hdfs文件系统里
[hadoop@Master input]$ hadoop fs -put /home/hadoop/input/sample.txt /user/hadoop/in/sample.txt
5.Maper : MinTemperatureMapper.java
import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class MinTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private static final int MISSING = -9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ String line = value.toString(); String year = line.substring(0,4); int airTemperature; airTemperature= Integer.parseInt(line.substring(14, 19).trim()); if (airTemperature!= MISSING) { context.write(new Text(year), new IntWritable(airTemperature)); } }
6.Reducer :MinTemperatureReducer.java
import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class MinTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { int minValue= Integer.MAX_VALUE; for (IntWritable value : values) { minValue= Math.min(minValue, value.get()); } context.write(key, new IntWritable(minValue)); } }
7.M-R Job :MinTemperature.java
import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MinTemperature { public static void main(String[] args) throws Exception { if (args.length!= 2) { System.err.println("Usage: MinTemperature<input path> <output path>"); System.exit(-1); } Job job= new Job(); job.setJarByClass(MinTemperature.class); job.setJobName("Min temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MinTemperatureMapper.class); job.setReducerClass(MinTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
8.编译,压缩成jar 包
[hadoop@Master myclass]$ javac -classpath /usr/hadoop/hadoop-core-1.2.1.jar MinTemperature*.java
[hadoop@Master myclass]$ jar cvf MinTemperature.jar MinTemperature*.class
added manifest
adding: MinTemperature.class(in = 1417) (out= 799)(deflated 43%)
adding: MinTemperatureMapper.class(in = 1740) (out= 722)(deflated 58%)
adding: MinTemperatureReducer.class(in = 1664) (out= 707)(deflated 57%)
9.执行作业
[hadoop@Master myclass]$ hadoop jar /usr/hadoop/myclass/MinTemperature.jar MinTemperature /user/hadoop/in/sample.txt ./out2
执行报错,发现报错,信息如下
找了半天原因,发现是没删掉class ,程序找不到类,在myclass 文件下删掉class文件,只保留生成的jar包
[hadoop@Master myclass]$ rm MinTemperature*.class
10.查看结果
气象上是每日20时(北京市)为日界,也就是说从今日20时到明日20时为一天。但是日最高气温是指02时到02时的气温,每天的最高气温观测时间是02时,观测完数据后进行调表(就是让最高气温表复原),日最低气温是14时到14时之间24小时内的最低气温值,14时观测后调表。通常日最高气温会出现在白天,大多数会出现在下午,通常最低气温会出现在夜间,大多数会出现在后半夜,但是也有不寻常的时候,尤其是在有冷空气入侵或突然剧烈升温时,出现的时间也就不一样了。
Cloudera Hadoop 4系列实战课程需要的话可以去我的网盘下载
pan.baidu.com/s/1Fv8ZY
评论暂时关闭