hadoop实验:求气象数据的最低温度,hadoop最低温度



1.下载部分数据,因为实验就只下载2003年的部分气象数据

2.通过zcat *gz > sample.txt命令解压重定向

[hadoop@Master test_data]$ zcat *gz > /home/hadoop/input/sample.txt

3.查看数据格式

4.把文件sample.txt放进hdfs文件系统里

[hadoop@Master input]$ hadoop fs -put /home/hadoop/input/sample.txt  /user/hadoop/in/sample.txt

5.Maper : MinTemperatureMapper.java


 import java.io.IOException;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Mapper;

 public class MinTemperatureMapper
   extends Mapper<LongWritable, Text, Text, IntWritable>
 {

   private static final int MISSING = -9999;

   @Override
   public void map(LongWritable key, Text value, Context context)
         throws IOException, InterruptedException{

     String line = value.toString();
     String year = line.substring(0,4);
     int airTemperature;
     airTemperature= Integer.parseInt(line.substring(14, 19).trim());

     if (airTemperature!= MISSING) {
     context.write(new Text(year), new IntWritable(airTemperature));
     }
   }

6.Reducer :MinTemperatureReducer.java

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MinTemperatureReducer
  extends Reducer<Text, IntWritable, Text, IntWritable>
{

  @Override
  public void reduce(Text key, Iterable<IntWritable> values,Context context)
          throws IOException, InterruptedException
        {

                int minValue= Integer.MAX_VALUE;
                for (IntWritable value : values)
                {
                        minValue= Math.min(minValue, value.get());
                }
                context.write(key, new IntWritable(minValue));
        }
}


7.M-R Job :MinTemperature.java

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MinTemperature
{
        public static void main(String[] args) throws Exception
        {
                if (args.length!= 2)
                {
                        System.err.println("Usage: MinTemperature<input path> <output path>");
                        System.exit(-1);
                }
                Job job= new Job();
                job.setJarByClass(MinTemperature.class);
                job.setJobName("Min temperature");
                FileInputFormat.addInputPath(job, new Path(args[0]));
                FileOutputFormat.setOutputPath(job, new Path(args[1]));
                job.setMapperClass(MinTemperatureMapper.class);
                job.setReducerClass(MinTemperatureReducer.class);
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(IntWritable.class);
                System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
}


8.编译,压缩成jar 包


[hadoop@Master myclass]$ javac -classpath /usr/hadoop/hadoop-core-1.2.1.jar  MinTemperature*.java


[hadoop@Master myclass]$ jar cvf MinTemperature.jar MinTemperature*.class
added manifest
adding: MinTemperature.class(in = 1417) (out= 799)(deflated 43%)
adding: MinTemperatureMapper.class(in = 1740) (out= 722)(deflated 58%)
adding: MinTemperatureReducer.class(in = 1664) (out= 707)(deflated 57%)


9.执行作业

[hadoop@Master myclass]$ hadoop jar /usr/hadoop/myclass/MinTemperature.jar MinTemperature  /user/hadoop/in/sample.txt  ./out2


执行报错,发现报错,信息如下



找了半天原因,发现是没删掉class ,程序找不到类,在myclass 文件下删掉class文件,只保留生成的jar包

[hadoop@Master myclass]$ rm MinTemperature*.class


10.查看结果











气象报告中最低温度与最高温度分别以什时段为标准

气象上是每日20时(北京市)为日界,也就是说从今日20时到明日20时为一天。但是日最高气温是指02时到02时的气温,每天的最高气温观测时间是02时,观测完数据后进行调表(就是让最高气温表复原),日最低气温是14时到14时之间24小时内的最低气温值,14时观测后调表。通常日最高气温会出现在白天,大多数会出现在下午,通常最低气温会出现在夜间,大多数会出现在后半夜,但是也有不寻常的时候,尤其是在有冷空气入侵或突然剧烈升温时,出现的时间也就不一样了。
 

助:谁可以提供 《hadoop权威指南》 气象数据集下载链接

Cloudera Hadoop 4系列实战课程需要的话可以去我的网盘下载

pan.baidu.com/s/1Fv8ZY
 

相关内容