What is the best practice for reporting exceptions in Hadoop streaming with Python scripts?
I mean: let's say I have a mapper script that can't understand its input, how do I signal Hadoop to terminate the job & report an error message?
Do I use logging and finish off with sys.exit?
If you want to signal error, return a non-zero code from your python script. You can write any logging to stderr and hadoop will capture that in the task logs. You can also send status to the reporter and counters by prefixing the stderr lines with reporter:status:<msg> or reporter:counter:<group>,<name>,<increment>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With