Hadoop Streaming Job failed error in python

Question

From this guide, I have successfully run the sample exercise. But on running my mapreduce job, I am getting the following error
ERROR streaming.StreamJob: Job not Successful! 10/12/16 17:13:38 INFO streaming.StreamJob: killJob... Streaming Job Failed!
Error from the log file

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170)

Mapper.py

import sys  i=0  for line in sys.stdin:     i+=1     count={}     for word in line.strip().split():         count[word]=count.get(word,0)+1     for word,weight in count.items():         print '%s	%s:%s' % (word,str(i),str(weight))

Reducer.py

import sys  keymap={} o_tweet="2323" id_list=[] for line in sys.stdin:     tweet,tw=line.strip().split()     #print tweet,o_tweet,tweet_id,id_list     tweet_id,w=tw.split(':')     w=int(w)     if tweet.__eq__(o_tweet):         for i,wt in id_list:             print '%s:%s	%s' % (tweet_id,i,str(w+wt))         id_list.append((tweet_id,w))     else:         id_list=[(tweet_id,w)]         o_tweet=tweet

[edit] command to run the job:

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.0-streaming.jar -file /home/hadoop/mapper.py -mapper /home/hadoop/mapper.py -file /home/hadoop/reducer.py -reducer /home/hadoop/reducer.py -input my-input/* -output my-output

Input is any random sequence of sentences.

Thanks,

Joe Stein · Accepted Answer

Your -mapper and -reducer should just be the script name.

hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.0-streaming.jar -file /home/hadoop/mapper.py -mapper mapper.py -file /home/hadoop/reducer.py -reducer reducer.py -input my-input/* -output my-output

When your scripts are in the job that is in another folder within hdfs which is relative to the attempt task executing as "." (FYI if you ever want to ad another -file such as a look up table you can open it in Python as if it was in the same dir as your scripts while your script is in M/R job)

also make sure you have chmod a+x mapper.py and chmod a+x reducer.py

Hadoop Streaming Job failed error in python

Tags:

db42

1 Answers

Joe Stein

Recent Activity

Donate For Us

Hadoop Streaming Job failed error in python

Tags:

db42

1 Answers

Joe Stein

Related questions

Recent Activity

Donate For Us