Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set number of reducer dynamically based on my mapper output size?

I know that the number of mapper can be set based on my dfs split size by setting mapred.min.split.size to dfs.block.size.

Similary how can set I the number of reducers based on my mapper output size?

PS: I know that the below options can be used to manipulate the number of reducer. mapred.tasktracker.reduce.tasks.maximum mapred.reduce.tasks

like image 272
Makubex Avatar asked Oct 31 '22 20:10

Makubex


2 Answers

No of reducers can not set after job submission. Think about it this way - partitioner is called on the mapper output and it needs to know no of reducers to partition.

like image 110
STLSOFT Big Data Training Avatar answered Nov 15 '22 08:11

STLSOFT Big Data Training


To set number of reducer task dynamically:

The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to adjust their DFS block size to adjust the number of maps.

So in the code below, let us set the number of reducer tasks dynamically to adjust with the number of map tasks at runtime.

In Java code:

long defaultBlockSize = 0;
int NumOfReduce = 10; // you can set anything
long inputFileLength = 0;
try {
    FileSystem fileSystem = FileSystem.get(this.getConf()); // hdfs file
                                                            // system
    inputFileLength = fileSystem.getContentSummary(
            new Path(PROP_HDFS_INPUT_LOCATION)).getLength();// input
                                                            // file or
                                                            // files
                                                            // stored in
                                                            // hdfs

    defaultBlockSize = fileSystem.getDefaultBlockSize(new Path(
            hdfsFilePath.concat("PROP_HDFS_INPUT_LOCATION")));// getting
                                                                // default
                                                                // block
                                                                // size
    if (inputFileLength > 0 && defaultBlockSize > 0) {
        NumOfReduce = (int) (((inputFileLength / defaultBlockSize) + 1) * 2);// calculating
                                                                                // no.
                                                                                // of
                                                                                // blocks
    }
    System.out.println("NumOfReduce : " + NumOfReduce);
} catch (Exception e) {
    LOGGER.error(" Exception{} ", e);
}

job.setNumReduceTasks(NumOfReduce);
like image 38
Gaurav Mishra Avatar answered Nov 15 '22 08:11

Gaurav Mishra