Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i debug Hadoop map reduce [duplicate]

im trying to build a map reduce job.

it runs to completion but present weird data at the end.

when i try to debug it using system.out.println("debug data") it doesnt show on screen.

using the java API to produce an external log file, trying to print to the screen using log.severe("log data") or using log4j logger method log.info(log data) wont work either/

nothing works the only time i see my debug messages is when there is an exception in the map reduce job.

how can it be fixed so i can see my debug messages either on a file or on the screen?

like image 900
Gabriel H Avatar asked Nov 13 '22 21:11

Gabriel H


1 Answers

Since you are processing big data, the size of your tracing messages can be huge, so it can cause a problem. It's useful to consider alternatives to "system.out.println" style logging:

  • use Counters (here is an simple example)
  • write logs to HDFS using MultipleOutputs

The best thing about Counters and MultipleOutputs - you can programmably access them, in case of MultipleOutputs you can even run map/reduce task to extract some statistics from logs.

An another alternative to debugging on production environment is unit-testing, MiniMRCluster will help you to test your map-reduce jobs during unit testing.

like image 112
rystsov Avatar answered Nov 15 '22 12:11

rystsov