What exactly is this keyword Context in Hadoop MapReduce world in new API terms?
Its extensively used to write output pairs out of Maps and Reduce, however I am not sure if it can be used somewhere else and what's exactly happening whenever I use context. Is it a Iterator with different name?
What is relation between Class Mapper.Context, Class Reducer.Context and Job.Context?
Can someone please explain this starting with Layman's terms and then going in detail. Not able understand much from Hadoop API documentations.
Thanks for your time and help.
WordCount is a simple application that counts the number of occurrences of each word in a given input set. This works with a local-standalone, pseudo-distributed or fully-distributed Hadoop installation (Single Node Setup).
In the driver class, we set the configuration of our MapReduce job to run in Hadoop. We specify the name of the job, the data type of input/output of the mapper and reducer. We also specify the names of the mapper and reducer classes. The path of the input and output folder is also specified.
The mapper processes the data and creates several small chunks of data. Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer's job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.
Context object: allows the Mapper/Reducer to interact with the rest of the Hadoop system. It includes configuration data for the job as well as interfaces which allow it to emit output.
Applications can use the Context:
The new API makes extensive use of Context objects that allow the user code to communicate with MapRduce system.
It unifies the role of JobConf, OutputCollector, and Reporter from old API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With