Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop streaming with C# and Mono : IdentityMapper being used incorrectly

I have mapper and reducer executables written in C#. I want to use these with Hadoop streaming.

This is the command I'm using to create the Hadoop job...

hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-*.jar 
-input "/user/hduser/ss_waits" 
-output "/user/hduser/ss_waits-output" 
–mapper "mono mapper.exe" 
–reducer "mono reducer.exe" 
-file "mapper.exe" 
-file "reducer.exe"

This is the error encountered by each mapper...

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1014)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Based on the call-stack, the problem seems to be that the (Java) IdentityMapper class is being used as the mapper. (Which explains why the type mismatch error was caused). The mapper should have been the executable "mono mapper.exe".

Any ideas why mono mapper.exe is not being used?

The mapper.exe and reducer.exe have the following permissions: -rwxr-xr-x

I am able to successfully execute mono mapper.exe from the unix command shell and have it read in text from stdin and write to stdout.

Environment:

  • Ubuntu Server 12.04 LTS (VM running on Azure)
  • Hadoop 1.0.4
  • Mono 2.10
like image 230
user1793093 Avatar asked Nov 02 '12 04:11

user1793093


1 Answers

Assuming mono is in the PATH, do you need the full path to mapper.exe and reducer.exe? i.e.

hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-*.jar 
-input "/user/hduser/ss_waits" 
-output "/user/hduser/ss_waits-output" 
–mapper "mono /path/to/mapper.exe" 
–reducer "mono /path/to/reducer.exe" 
-file "mapper.exe" 
-file "reducer.exe"
like image 178
joncham Avatar answered Sep 21 '22 17:09

joncham