Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging hadoop applications

I tried printing out values using System.out.println(), but they won't appear on the console. How do i print out the values in a map/reduce application for debugging purposes using Hadoop?

Thanks, Deepak.

like image 945
Deepak Avatar asked May 14 '10 14:05

Deepak


People also ask

What is MapReduce in Hadoop and how it works?

MapReduce assigns fragments of data across the nodes in a Hadoop cluster. The goal is to split a dataset into chunks and use an algorithm to process those chunks at the same time. The parallel processing on multiple machines greatly increases the speed of handling even petabytes of data.

Which of the following applies to the Hadoop shuffling and sorting phase?

Shuffle phase in Hadoop transfers the map output from Mapper to a Reducer in MapReduce. Sort phase in MapReduce covers the merging and sorting of map outputs. Data from the mapper are grouped by the key, split among reducers and sorted by the key. Every reducer obtains all values associated with the same key.


1 Answers

The page @SquareCog points to is a very good source of information on debugging a MapReduce job once you are running it on a cloud.

Before you reach that point though you should consider writing unit tests for your mappers and reducers, so you can verify that the basic logic works. If you are interested in unit tests to test drive your map and reduce logic check out mrunit, which works in a similar fashion to JUnit.

like image 199
Binary Nerd Avatar answered Oct 06 '22 23:10

Binary Nerd