MapReduce编程-自连接，mapreduce编程连接

文章由LinuxBoy分享于2019-03-27 06:03:53热评（362）

MapReduce编程-自连接，mapreduce编程连接

SQL自连接

SQL自身连接，可以解决很多问题。下面举的一个例子，就是使用了SQL自身连接，它解决了列与列之间的逻辑关系问题，准确的讲是列与列之间的层次关系。

对于下面的表cp(存储的孩子和父母的关系)，用一个SQL，找出所有的 grandchild 和 grandparent，就是找出所有的孙子 -> 祖父母

+-------+--------+
| child | parent |
+-------+--------+
| tom   | jack   |
| hello | jack   |
| tom   | tong   |
| jack  | gao    |
| tom   | jack   |
| hello | jack   |
| tong  | haha   |
+-------+--------+

可用如下SQL

select t1.child,t2.parent from cp t1,cp t2 where t1.parent = t2.child;

结果如下：

+-------+--------+
| child | parent |
+-------+--------+
| tom   | gao    |
| hello | gao    |
| tom   | haha   |
+-------+--------+

MapReduce编程实现

上面的SQL其实是做了一个笛卡尔积，怎么用MR算法来实现呢？

使用连接列作为key来实现相等匹配。需要让map的输出能包含左右两张表(t1,t2)的信息。

对于如下的输入文件:

Tom Lucy
Tom Jack
Jone Lucy
Jone Jack
Lucy Mary
Lucy Ben
Jack Alice
Jack Jesse
Terry Tom

map的输出是:

Tom parent_Lucy
Lucy child_Tom
Tom parent_Jack
Jack child_Tom
.....

其实就是每条记录都重复两次，来表示两张表。

map函数如下：

		public void map(Object key, Text value, Context context)
				throws IOException, InterruptedException {
			String strs[] = value.toString().split(" ");
			if (strs.length == 2) {
				context.write(new Text(strs[0]), new Text("parent_" + strs[1]));
				context.write(new Text(strs[1]), new Text("child_" + strs[0]));
			}
		}

reduce函数：

		public void reduce(Text text, Iterable<Text> values, Context context)
				throws IOException, InterruptedException {
			List<Text> grandchilds = new ArrayList<Text>();
			List<Text> grandparents = new ArrayList<Text>();
			for (Text name : values) {
				if (name.toString().trim().startsWith("child_")) {
					grandchilds.add(new Text(name.toString().substring(6)));
				} else if (name.toString().trim().startsWith("parent_")) {
					grandparents.add(new Text(name.toString().substring(7)));
				}
			}
			for (Text gchild : grandchilds) {
				for (Text gparent : grandparents) {
					context.write(gchild, gparent);
				}
			}
		}
	}

输出结果为：

Jone Jesse
Jone Alice
Tom Jesse
Tom Alice
Tom Ben
Tom Mary
Jone Ben
Jone Mary
Terry Jack
Terry Lucy

推荐文章：

MapReduce编程-自连接，mapreduce编程连接