Running the Python Code on Hadoop Failed

Tags:

hadoop-streaming

I have tried to follow the instructions on this page: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

$bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -input /user/root/wordcountpythontxt -output /user/root/wordcountpythontxt-output -mapper /user/root/wordcountpython/mapper.py -reducer /user/root/wordcountpython/reducer.py -file /user/root/mapper.py -file /user/root/reducer.py

It says

File: /user/root/mapper.py does not exist, or is not readable.
Streaming Command Fail

When i browsed through the url:jobdetails.jsp/

i found lot of exception

java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
Caused by: java.io.IOException: Cannot run program "/user/root/wordcountpython/mapper.py": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:53)
    at java.lang.ProcessImpl.start(ProcessImpl.java:91)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

i am not able to fix it out pls help me to run the python pgm.

982

asked Mar 12 '13 04:03

2 Answers

If you checked the instructions carefully on the link,

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -file /home/hduser/mapper.py -mapper /home/hduser/mapper.py -file /home/hduser/reducer.py -reducer /home/hduser/reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output

there it clearly shows there is no need to copy the mapper.py and reducer.py to the HDFS, you can link both the files from the local filesystem: as /path/to/mapper. I am sure you can avoid the above error.

105

answered Oct 06 '22 00:10

Anil Muppalla

You might want to check that you don't have a dos-style new line after your #! line within mapper.py. If you do, hadoop may not be able to find your python interpreter since it'll see an extra CR. E.g. /usr/local/bin/python^M instead of /usr/local/bin/python where ^M is CR. Try dos2unix command on both your mapper and reducer.

answered Oct 06 '22 01:10

John Pickard

Related questions
                            
                                Unzipping a Zip File in Django
                            
                                Is there any way to override the double-underscore (magic) methods of arbitrary objects in Python?
                            
                                Django : Call a method only once when the django starts up
                            
                                PyUsb USB Barcode Scanner
                            
                                Installing PyGIMP on Windows
                            
                                Python read microphone
                            
                                Embedding Python in C: Error when attempting to call Python code in a C callback called by Python code
                            
                                NumPy linspace rounding error
                            
                                Django Class Based View: Validate object in dispatch
                            
                                Gevent-Websocket Detecting closed connection
                            
                                Threads and local proxy in Werkzeug. Usage
                            
                                Model I-V in Python
                            
                                ImportError: No module named **** Error in google app engine
                            
                                Python 3 and tkinter opening new window by clicking the button
                            
                                Generate functions without closures in python
                            
                                Sphinx Documentation, numbered figure references
                            
                                Scope in Ruby and Python
                            
                                Gdata python Google apps authentication
                            
                                Flask-SqlAlchemy Adjacency List Relationship backfref unexpected error
                            
                                clustering for trajectories

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Running the Python Code on Hadoop Failed

Tags:

python

hadoop-streaming

Unmesha Sreeveni U.B

People also ask

2 Answers

Anil Muppalla

John Pickard

Recent Activity

Donate For Us