"a reducer is different than a reduce task. A reducer can run multiple reduce tasks". Can someone explain this with the below example?
foo.txt: Sweet, this is the foo file bar.txt: This is the bar file
and I am using 2 reducers. What are the reduce tasks and based on what multiple reduce tasks are generated in a reducer?
Reducer is a class, which contain reduce function as below
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
) throws IOException, InterruptedException {
Reduce task is program running on a node, which is executing reduce function of Reducer class.
You can think Reduce task as an instance of Reducer
Have a look at Apache MapReduce tutorial page for more details ( Payload section).
From my understanding, Reducer is a slot of computational resource, that can be used to accomplish reduce tasks. A reducer can be assigned to a task, which it performs to completion/failure and as soon as the task reaches an end-state, it is available for processing another reduce task, post-cleanup.
In Yarn, the concepts are a bit different though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With