How to set number of reducer dynamically based on my mapper output size?

Question

I know that the number of mapper can be set based on my dfs split size by setting mapred.min.split.size to dfs.block.size.

Similary how can set I the number of reducers based on my mapper output size?

PS: I know that the below options can be used to manipulate the number of reducer. mapred.tasktracker.reduce.tasks.maximum mapred.reduce.tasks

STLSOFT Big Data Training · Accepted Answer

No of reducers can not set after job submission. Think about it this way - partitioner is called on the mapper output and it needs to know no of reducers to partition.

Gaurav Mishra · Answer

To set number of reducer task dynamically:

The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to adjust their DFS block size to adjust the number of maps.

So in the code below, let us set the number of reducer tasks dynamically to adjust with the number of map tasks at runtime.

In Java code:

long defaultBlockSize = 0;
int NumOfReduce = 10; // you can set anything
long inputFileLength = 0;
try {
    FileSystem fileSystem = FileSystem.get(this.getConf()); // hdfs file
                                                            // system
    inputFileLength = fileSystem.getContentSummary(
            new Path(PROP_HDFS_INPUT_LOCATION)).getLength();// input
                                                            // file or
                                                            // files
                                                            // stored in
                                                            // hdfs

    defaultBlockSize = fileSystem.getDefaultBlockSize(new Path(
            hdfsFilePath.concat("PROP_HDFS_INPUT_LOCATION")));// getting
                                                                // default
                                                                // block
                                                                // size
    if (inputFileLength > 0 && defaultBlockSize > 0) {
        NumOfReduce = (int) (((inputFileLength / defaultBlockSize) + 1) * 2);// calculating
                                                                                // no.
                                                                                // of
                                                                                // blocks
    }
    System.out.println("NumOfReduce : " + NumOfReduce);
} catch (Exception e) {
    LOGGER.error(" Exception{} ", e);
}

job.setNumReduceTasks(NumOfReduce);

How to set number of reducer dynamically based on my mapper output size?

Tags:

hadoop

mapreduce

distributed

Makubex

2 Answers

STLSOFT Big Data Training

Gaurav Mishra

Recent Activity

Donate For Us

How to set number of reducer dynamically based on my mapper output size?

Tags:

hadoop

mapreduce

distributed

Makubex

2 Answers

STLSOFT Big Data Training

Gaurav Mishra

Related questions

Recent Activity

Donate For Us