hadoop学习-wordcount程序c++重写执行,hadoop-wordcount


1、程序执行命令:

hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input /input/wordcount/sample.txt -output /output/wordcount -program /bin/wordcount

2、具体代码:

wordcount.h

#include <algorithm> 
#include <stdint.h>
#include <string>
#include <vector>
#include "Pipes.hh"  
#include "TemplateFactory.hh"  
#include "StringUtils.hh" 
#include <iostream>
using namespace std;

class WordcountMapper : public HadoopPipes::Mapper 
{ 
	public:  
		WordcountMapper(HadoopPipes::TaskContext& context);	
		vector<string> split(const string& src, const string& separator);
		//重写map程序
	  void map(HadoopPipes::MapContext& context);
};  

class WordcountReducer : public HadoopPipes::Reducer 
{  
	public:  
	  WordcountReducer(HadoopPipes::TaskContext& context);
	  //重写reduce程序
	  void reduce(HadoopPipes::ReduceContext& context);
};

wordcount.cpp


#include "wordcount.h"

WordcountMapper::WordcountMapper(HadoopPipes::TaskContext& context)
{
}

void WordcountMapper::map(HadoopPipes::MapContext& context)
{ 
	int count = 1; 
	string line = context.getInputValue();  
	vector<string> wordVec = split(line, " ");
	for(unsigned i=0; i<wordVec.size(); i++)
	{
		context.emit(wordVec[i], HadoopUtils::toString(count));
	}
}  

vector<string> WordcountMapper::split(const string& src, const string& separator)
{
    vector<string> dest;
    string str = src;
    string substring;
    string::size_type start = 0, index = 0;
		while(index != string::npos)    
		{
        index = str.find_first_of(separator,start);
        if (index != string::npos)
        { 
            substring = str.substr(start,index-start);
            dest.push_back(substring);
            start = str.find_first_not_of(separator,index);
            if (start == string::npos) return dest;
        }
    }
    substring = str.substr(start);
    dest.push_back(substring);
    return dest;
}
 
WordcountReducer::WordcountReducer(HadoopPipes::TaskContext& context)
{
}
void WordcountReducer::reduce(HadoopPipes::ReduceContext& context)
{  
	int wSum = 0;
  while (context.nextValue())
	{  
    wSum = wSum + HadoopUtils::toInt(context.getInputValue()) ;  
  }  
  context.emit(context.getInputKey(), HadoopUtils::toString(wSum));  
}  

main.cpp

/*
**hadoop的mapreduce框架,其中map数据准备,reduce数据汇总,我们通过客户端程序和hadoop集群进行通信
**一般java程序使用streaming标准流,c++使用socket套接字
**map在数据节点上执行,本地化无网络占用;reduce由map提供数据(map输出=reduce输入),网络占用较高;
**一般java入参可以用int和string,但c++只有string,虽然简化接口,但很多时候需要人工转化
***********************************
**hadoop集群执行word统计功能
**hadoop执行task方法:runTask
**hadoop执行job命令:mapredjob.sh
**HDFS数据文件:/input/wordcount/sample.txt
**HDFS输出文件:/output/wordcount 
*/
#include "wordcount.h"

int main(int argc, char *argv[]) 
{  
  return HadoopPipes::runTask(HadoopPipes::TemplateFactory<WordcountMapper, WordcountReducer>());  
}  

makefile程序:

.SUFFIXES:.h .c .cpp .o

CC=g++
CPPFLAGS = -m64 
RM = rm
SRCS = wordcount.cpp main.cpp
PROGRAM = wordcount
OBJS=$(SRCS:.cpp=.o)

INC_PATH = -I$(HADOOP_DEV_HOME)/include
LIB_PATH = -L$(HADOOP_DEV_HOME)/lib/native
LIBS = -lhadooppipes -lcrypto -lhadooputils -lpthread

#$?表示依赖项 $@表示目的项
$(PROGRAM):$(OBJS)
	$(CC) $? -Wall $(LIB_PATH) $(LIBS)  -g -O2 -o $@

$(OBJS):$(SRCS)
	$(CC) $(CPPFLAGS) -c $(SRCS)  $(INC_PATH)
	
.PHONY:clean
clean:
	$(RM) $(PROGRAM) $(OBJS)
	
	


源数据:

Happiness is not about being immortal nor having food or rights in one&apos;s hand. It??s about having each tiny wish come true, or having something to eat when you are hungry or having someone&apos;s love when you need love
Happiness is not about being immortal nor having food or rights in one&apos;s hand. It??s about having each tiny wish come true, or having something to eat when you are hungry or having someone&apos;s love when you need love
Happiness is not about being immortal nor having food or rights in one&apos;s hand. It??s about having each tiny wish come true, or having something to eat when you are hungry or having someone&apos;s love when you need love

执行结果:

Happiness 3
It��s 3
about 6
are 3
being 3
come 3
each 3
eat 3
food 3
hand. 3
having 12
hungry 3
immortal 3
in 3
is 3
love 6
need 3
nor 3
not 3
one&apos;s 3
or 9
rights 3
someone&apos;s 3
something 3
tiny 3
to 3
true, 3
when 6
wish 3
you 6




hadoop的MapReduce程序运行操作问题

都可以,简单的直接用txt打开java文件,写好后打包成class文件,就可以运行了。你看他原来在哪里放class文件的,你就放在那里
 

Hadoop用控制台编译执行自带程序WordCountjava错误,问怎改正

classpath不止这么点吧,你可以看看~/hadoop-0.20.2/bin/hadoop-config文件,其中的$CLASSPATH就是运行hadoop程序时需要的classpath。
里面包含了
$HADOOP_COMMON_HOME/build/classes
:$HADOOP_COMMON_HOME/build
$HADOOP_COMMON_HOME/build/test/classes
$HADOOP_COMMON_HOME/build/test/core/classes
:$HADOOP_COMMON_HOME
 

相关内容