Grep seems not to be working for hadoop streaming
For: hadoop jar /usr/local/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-streaming.jar -input /user/root/tmp2/user.data -output /user/root/selected_data -mapper '/bin/grep 1938678460' -reducer 'wc' -jobconf mapred.output.compress=false
I get: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:17
Any idea?
I also tried: -mapper 'cat' -reducer '/bin/grep 1938678460' (cat works, grep does not)
....I also checked on all machines that /bin/grep is there and it is
Grep does not work , or I'm missing something?
I haven't tried this myself, but grep exits with a non-zero exit code if it didn't find something. If a map doesn't contain the string you grep for, you get a non-zero exit code and hadoop will error. Maybe something like "/bin/grep || true" works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With