MultipleOutputFormat in hadoop

Tags:

I'm a newbie in Hadoop. I'm trying out the Wordcount program.

Now to try out multiple output files, i use MultipleOutputFormat. this link helped me in doing it. http://hadoop.apache.org/common/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html

in my driver class i had

Click to copy

    MultipleOutputs.addNamedOutput(conf, "even",
            org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
            IntWritable.class);

    MultipleOutputs.addNamedOutput(conf, "odd",
            org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
            IntWritable.class);`

and my reduce class became this

Click to copy

public static class Reduce extends MapReduceBase implements
        Reducer<Text, IntWritable, Text, IntWritable> {
    MultipleOutputs mos = null;

    public void configure(JobConf job) {
        mos = new MultipleOutputs(job);
    }

    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }
        if (sum % 2 == 0) {
            mos.getCollector("even", reporter).collect(key, new IntWritable(sum));
        }else {
            mos.getCollector("odd", reporter).collect(key, new IntWritable(sum));
        }
        //output.collect(key, new IntWritable(sum));
    }
    @Override
    public void close() throws IOException {
        // TODO Auto-generated method stub
    mos.close();
    }
}

Things worked , but i get LOT of files, (one odd and one even for every map-reduce)

Question is : How can i have just 2 output files (odd & even) so that every odd output of every map-reduce gets written into that odd file, and same for even.

654

asked Aug 16 '10 06:08

raj

1 Answers

Each reducer uses an OutputFormat to write records to. So that's why you are getting a set of odd and even files per reducer. This is by design so that each reducer can perform writes in parallel.

If you want just a single odd and single even file, you'll need to set mapred.reduce.tasks to 1. But performance will suffer, because all the mappers will be feeding into a single reducer.

Another option is to change the process the reads these files to accept multiple input files, or write a separate process that merges these files together.

191

answered Oct 06 '22 01:10

bajafresh4life

Related questions
                            
                                Maven is trying to install every jar from a private repository
                            
                                Topological Sort with Grouping
                            
                                How to use direct streaming for SOAP with Spring-WS?
                            
                                How to handle "infinite" web pages?
                            
                                is there a way to use tr/// (or equivalent) in java?
                            
                                How to organize controllers/presenters in a large JavaFx 2.0 application?
                            
                                java: how to declare final a variable that is initialized inside a try - catch block?
                            
                                Is the final keyword necessary in the Java singleton class?
                            
                                Use an implicit TypeHandler based on resultType for select in MyBatis
                            
                                How to organise unit, integration, e2e tests folder structure in maven for a Java project?
                            
                                Android logcat error: ZipFileCache: init failed when open zip file - device specific?
                            
                                Spring JPA: Should the Save() method commit data to the database?
                            
                                Netbeans 8.2 with jdk 9
                            
                                Multidex installation failure
                            
                                Single transaction across multiple threads solution
                            
                                Jackson: How can I generate json schema which rejects all additional content
                            
                                KAFKA: Failed to update metadata after 60000 ms
                            
                                Can I find the URL for a spring mvc controller in the view layer?
                            
                                Sizing and Capacity Planning Tips and How-to
                            
                                Programmatically find complement of colors?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MultipleOutputFormat in hadoop

Tags:

java

hadoop

mapreduce

raj

People also ask

1 Answers

bajafresh4life

Recent Activity

Donate For Us