Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop streaming: reporting error

What is the best practice for reporting exceptions in Hadoop streaming with Python scripts?

I mean: let's say I have a mapper script that can't understand its input, how do I signal Hadoop to terminate the job & report an error message?

Do I use logging and finish off with sys.exit?

like image 388
jldupont Avatar asked Apr 09 '26 02:04

jldupont


1 Answers

If you want to signal error, return a non-zero code from your python script. You can write any logging to stderr and hadoop will capture that in the task logs. You can also send status to the reporter and counters by prefixing the stderr lines with reporter:status:<msg> or reporter:counter:<group>,<name>,<increment>

like image 142
Chris White Avatar answered Apr 11 '26 14:04

Chris White