Hadoop Configuration设置自定义类变量

文章由LinuxBoy分享于2019-03-27 02:03:01热评（162）

Hadoop Configuration设置自定义类变量

hadoop1.0.4

有时在编写Hadoop的MR的时候，会想到如果在Configuration里面可以设置一个类变量多好呀。查看Configuration的api可以看到，一般set方法都是set一般数据类型，比如int，string或者double之类的。那有没有一个方法设置一个自定义类的变量呢，比如setClass，还真别说，还真有这个方法。

查看api：

setClass

public void setClass(String name,
                     Class<?> theClass,
                     Class<?> xface)
Set the value of the name property to the name of a theClass implementing the given interface xface. An exception is thrown if theClass does not implement the interface xface.
Parameters:
name - property name.
theClass - property value.
xface - the interface implemented by the named class.

但是，感觉不怎么对的？网上查了下，发现这个是设置类型的，用法如下：

/**
	 * 根据变量获得数据类型
	 */
	public static void testSetClass(){
		Configuration conf= new Configuration();
		
		conf.setClass("mapout", LongWritable.class	, Writable.class);
		Class<?> b=conf.getClass("mapout", Writable.class);
		System.out.println(b);
		conf.setClass("myClass", User.class, Person.class);
		b=conf.getClass("myClass", Person.class);
		System.out.println(b);
	}

这样打印出来的是（User是Person的子类）：

class org.apache.hadoop.io.LongWritable
class org.fz.testconfig.User

这个其实就是设置类型用的，感觉用处不是很大。如果想在Mapper或者Reducer的setup函数中获得某个类型，那么其实也可以使用下面的方式：

/**
	 * 根据包路径获得数据类型
	 * @throws ClassNotFoundException
	 */
	public static void testGetClassByName() throws ClassNotFoundException{
		Configuration conf= new Configuration();
		Class<?> a=conf.getClassByName("org.apache.hadoop.io.LongWritable");
		System.out.println(a);
		a=conf.getClassByName("org.fz.testconfig.User");
		System.out.println(a);
	}

这样打印出来的是：

class org.apache.hadoop.io.LongWritable
class org.fz.testconfig.User

使用getClassByName也可以获得类，但是却获得不到类变量。如果我想获得类变量应该如何做呢？

类变量可以转换为Json字符串，字符串可以使用configuration的set方法设置，然后再使用get方法获得。所以，可以写一个中间的转换类，专门把类转换为字符串，同时还可以把字符串转换为一个类。这里使用的json是阿里巴巴的，可以在这里下载：https://github.com/alibaba/fastjson。

转换类如下：

package org.fz.testconfig;

import org.apache.hadoop.conf.Configuration;

public class ConfigurationUtil {

	/**
	 * Configuration设置自定义类数据
	 * @param key 变量名
	 * @param conf Configuration
	 * @param userDefineObject 自定义类
	 */
	public static void setClass(String key,Configuration conf,Object userDefineObject ){
		
		String userStr = com.alibaba.fastjson.JSON.toJSON(userDefineObject).toString();
		conf.set(key, userStr);
	}
	/**
	 * Configuration 获得自定义数据类
	 * @param key 变量名
	 * @param conf Configuration
	 * @param classType 返回值类型
	 * @return
	 */
	public static Object getClass(String key,Configuration conf,Class<?> classType){
		String str=conf.get(key);
		Object object =com.alibaba.fastjson.JSON.parseObject(str, classType);
		return object;
	}
}

如何使用呢，可以参考下面的方式：

/**
	 * 获得自定义类数据
	 */
	public static void testSetClassReal(){
		Configuration conf= new Configuration();
		User u = new User("test",11); 
		/*	String userStr = com.alibaba.fastjson.JSON.toJSON(u).toString();
		conf.set("USER", userStr);*/
		ConfigurationUtil.setClass("USER", conf, u);
		
		/*String returnStr=conf.get("USER");
		User user =com.alibaba.fastjson.JSON.parseObject(returnStr, User.class);*/
		User user=(User) ConfigurationUtil.getClass("USER", conf, User.class);
		System.out.println(user.getAge());
		System.out.println(user.getPersons());
	}

这样打印的数据如下：

11
[org.fz.testconfig.Person@10358032]

为了整体完整性，贴上Person和User的代码吧：

package org.fz.testconfig;

import java.util.ArrayList;
import java.util.List;

public class User extends Person{

	private String name;
	private int age;
	private int[] arr;
	private List<Person> persons= new ArrayList<Person>();
	
	public User(){}
	public User(String name,int age){
		this.age=age;
		this.name=name;
		this.persons.add(new Person("p1"));
		
	}
	
	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	public int getAge() {
		return age;
	}
	public void setAge(int age) {
		this.age = age;
	}
	public int[] getArr() {
		return arr;
	}
	public void setArr(int[] arr) {
		this.arr = arr;
	}
	public List<Person> getPersons() {
		return persons;
	}
	public void setPersons(List<Person> persons) {
		this.persons = persons;
	}
}

package org.fz.testconfig;

public class Person {
	private String name;
	public Person(){}
	public Person(String name){this.name=name;}
	public String getName() {
		return name;
	}

	public void setName(String name) {
		this.name = name;
	}
}

虽然直接使用Configuration可以使用了，但是如果真的放在MR上面可以使用么？下面来测试一下吧：

编写下面的测试类：

package org.fz.testconfig;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class LoadDriverTest {
	private static Logger logger = LoggerFactory.getLogger(LoadDriverTest.class);
	public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException{
		if(args.length!=2){
			return ;
		}
		Configuration conf= new Configuration();
		conf.set("mapred.job.tracker", "master:9001");
		conf.set("fs.default.name", "master:9000");
		User user= new User("user1",22);
		ConfigurationUtil.setClass("USER", conf, user);
		Job job=new Job(conf,"text2vectorWritable with input:"+args[0]);
	    job.setMapperClass(LoadMapper.class);
	    job.setMapOutputKeyClass(LongWritable.class);
	    job.setMapOutputValueClass(Text.class);
	    job.setOutputKeyClass(LongWritable.class);
	    job.setOutputValueClass(Text.class);
	    job.setNumReduceTasks(0);
	    job.setJarByClass(LoadDriverTest.class);
	    
	    FileInputFormat.addInputPath(job, new Path(args[0]));
	    FileOutputFormat.setOutputPath(job, new Path(args[1]));
	    if (!job.waitForCompletion(true)) {
	        throw new InterruptedException("Text to VectorWritable Job failed processing " + args[0]);
	      }
	}
	
	public static class LoadMapper extends Mapper<LongWritable ,Text,LongWritable,Text>{
		private User user=null;
		@Override
		public void setup(Context cxt){
			user=(User)ConfigurationUtil.getClass("USER", cxt.getConfiguration(), User.class);
			logger.info("user:"+user.getAge()+","+user.getName());
		}
		@Override
		public void map(LongWritable key,Text value,Context cxt) throws IOException,InterruptedException{
			String v=user.getName()+","+user.getAge()+","+user.getPersons().toString();
			cxt.write(key, new Text(v));
		}
	}
}

打包放入hadoop云平台lib包下，同时刚才下载的json包也放在lib下面，运行，然后查看这个job任务的log信息以及输出文件。

首先是log信息：

这里可以看到确实是读取到了user的信息，到这里后，其实已经不用再次查看输出信息了，不过还是看一眼吧：

看到也是一样的；

如果您觉得lz的blog或者资源还ok的话，可以选择给lz投一票，多谢。（投票地址：http://vote.blog.csdn.net/blogstaritem/blogstar2013/fansy1990 ）

http://blog.csdn.net/fansy1990

推荐文章：

Hadoop Configuration设置自定义类变量