Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between reduce task and a reducer

"a reducer is different than a reduce task. A reducer can run multiple reduce tasks". Can someone explain this with the below example?

foo.txt: Sweet, this is the foo file bar.txt: This is the bar file

and I am using 2 reducers. What are the reduce tasks and based on what multiple reduce tasks are generated in a reducer?

like image 270
Arighna Avatar asked Feb 07 '23 13:02

Arighna


2 Answers

Reducer is a class, which contain reduce function as below

protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {

Reduce task is program running on a node, which is executing reduce function of Reducer class.

You can think Reduce task as an instance of Reducer

Have a look at Apache MapReduce tutorial page for more details ( Payload section).

like image 92
Ravindra babu Avatar answered Feb 13 '23 04:02

Ravindra babu


From my understanding, Reducer is a slot of computational resource, that can be used to accomplish reduce tasks. A reducer can be assigned to a task, which it performs to completion/failure and as soon as the task reaches an end-state, it is available for processing another reduce task, post-cleanup.

In Yarn, the concepts are a bit different though.

like image 21
rahulbmv Avatar answered Feb 13 '23 04:02

rahulbmv