I know that the number of mapper can be set based on my dfs split size by setting mapred.min.split.size to dfs.block.size.
Similary how can set I the number of reducers based on my mapper output size?
PS: I know that the below options can be used to manipulate the number of reducer. mapred.tasktracker.reduce.tasks.maximum mapred.reduce.tasks
No of reducers can not set after job submission. Think about it this way - partitioner is called on the mapper output and it needs to know no of reducers to partition.
To set number of reducer task dynamically:
The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to adjust their DFS block size to adjust the number of maps.
So in the code below, let us set the number of reducer tasks dynamically to adjust with the number of map tasks at runtime.
In Java code:
long defaultBlockSize = 0;
int NumOfReduce = 10; // you can set anything
long inputFileLength = 0;
try {
FileSystem fileSystem = FileSystem.get(this.getConf()); // hdfs file
// system
inputFileLength = fileSystem.getContentSummary(
new Path(PROP_HDFS_INPUT_LOCATION)).getLength();// input
// file or
// files
// stored in
// hdfs
defaultBlockSize = fileSystem.getDefaultBlockSize(new Path(
hdfsFilePath.concat("PROP_HDFS_INPUT_LOCATION")));// getting
// default
// block
// size
if (inputFileLength > 0 && defaultBlockSize > 0) {
NumOfReduce = (int) (((inputFileLength / defaultBlockSize) + 1) * 2);// calculating
// no.
// of
// blocks
}
System.out.println("NumOfReduce : " + NumOfReduce);
} catch (Exception e) {
LOGGER.error(" Exception{} ", e);
}
job.setNumReduceTasks(NumOfReduce);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With