Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is Keyword Context in Hadoop programming world?

What exactly is this keyword Context in Hadoop MapReduce world in new API terms?

Its extensively used to write output pairs out of Maps and Reduce, however I am not sure if it can be used somewhere else and what's exactly happening whenever I use context. Is it a Iterator with different name?

What is relation between Class Mapper.Context, Class Reducer.Context and Job.Context?

Can someone please explain this starting with Layman's terms and then going in detail. Not able understand much from Hadoop API documentations.

Thanks for your time and help.

like image 691
Brijesh Avatar asked Nov 16 '14 05:11

Brijesh


People also ask

What is word count in Hadoop?

WordCount is a simple application that counts the number of occurrences of each word in a given input set. This works with a local-standalone, pseudo-distributed or fully-distributed Hadoop installation (Single Node Setup).

What is driver code in Hadoop?

In the driver class, we set the configuration of our MapReduce job to run in Hadoop. We specify the name of the job, the data type of input/output of the mapper and reducer. We also specify the names of the mapper and reducer classes. The path of the input and output folder is also specified.

What is a mapper and reducer in Hadoop?

The mapper processes the data and creates several small chunks of data. Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer's job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.


2 Answers

Context object: allows the Mapper/Reducer to interact with the rest of the Hadoop system. It includes configuration data for the job as well as interfaces which allow it to emit output.

Applications can use the Context:

  • to report progress
  • to set application-level status messages
  • update Counters
  • indicate they are alive
  • to get the values that are stored in job configuration across map/reduce phase.
like image 72
SMA Avatar answered Oct 21 '22 01:10

SMA


The new API makes extensive use of Context objects that allow the user code to communicate with MapRduce system.

It unifies the role of JobConf, OutputCollector, and Reporter from old API.

like image 41
sras Avatar answered Oct 21 '22 02:10

sras