I want to debug a mapreduce script, and without going into much trouble tried to put some print statements in my program. But I cant seem to find them in any of the logs.
A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system.
In Hadoop,the output of Mapper is stored on local disk,as it is intermediate output. There is no need to store intermediate data on HDFS because : data write is costly and involves replication which further increases cost head and time.
The intermediate data for reduce tasks is stored at a data location with the most available space. The mappers and reducers pick disks in a scheduled order, for storing intermediate data at the job level within the number of local disks.
MapReduce is a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).
Actually stdout only shows the System.out.println()
of the non-map reduce classes.
The System.out.println()
for map and reduce phases can be seen in the logs. Easy way to access the logs is
http://localhost:50030/jobtracker.jsp->click on the completed job->click on map or reduce task->click on tasknumber->task logs->stdout logs.
Hope this helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With