I tried printing out values using System.out.println(), but they won't appear on the console. How do i print out the values in a map/reduce application for debugging purposes using Hadoop?
Thanks, Deepak.
MapReduce assigns fragments of data across the nodes in a Hadoop cluster. The goal is to split a dataset into chunks and use an algorithm to process those chunks at the same time. The parallel processing on multiple machines greatly increases the speed of handling even petabytes of data.
Shuffle phase in Hadoop transfers the map output from Mapper to a Reducer in MapReduce. Sort phase in MapReduce covers the merging and sorting of map outputs. Data from the mapper are grouped by the key, split among reducers and sorted by the key. Every reducer obtains all values associated with the same key.
The page @SquareCog points to is a very good source of information on debugging a MapReduce job once you are running it on a cloud.
Before you reach that point though you should consider writing unit tests for your mappers and reducers, so you can verify that the basic logic works. If you are interested in unit tests to test drive your map and reduce logic check out mrunit, which works in a similar fashion to JUnit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With