Do mappers store it's intermediate outputs on datanode's RAM on which it is running?

Question

Is my understanding correct that job tracker launches task(mapper/reducer) on datanode where inputsplit is stored and runs that task on that piece of data and mapper stores it's intermediate output in its local storage ?

so my question is: as mapper is running on datanode so it stores it's intermediate data on datanode's RAM? And as datanode disk is the part of an hdfs and intermediate output is not stored on hdfs..

Kris · Accepted Answer

The output of the Mapper (intermediate data) is stored on the Local file system (not HDFS) of each individual mapper data nodes. This is typically a temporary directory which can be setup in config by the Hadoop administrator. Once the Mapper job completed or the data transferred to the Reducer, these intermediate data is cleaned up and no more accessible.

Sumeet Gupta · Answer

The Map tasks initially store its output in the buffer of the datanode.

Once the buffer is filled up to 80% of its capacity, it starts to write on the disk of the datanode itself (not HDFS). This disk location can be viewed/modified in the mapred-site.xml in Hadoop 2.0 under property name-

mapreduce.cluster.local.dir

Do mappers store it's intermediate outputs on datanode's RAM on which it is running?

Tags:

hadoop

mapreduce

user2017

2 Answers

Kris

Sumeet Gupta

Recent Activity

Donate For Us

Do mappers store it's intermediate outputs on datanode's RAM on which it is running?

Tags:

hadoop

mapreduce

user2017

2 Answers

Kris

Sumeet Gupta

Related questions

Recent Activity

Donate For Us