Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing to multiple folders in hadoop?

Tags:

hadoop

I am trying to separate my output from reducer to different folders..

My dirver has the following code:
 FileOutputFormat.setOutputPath(job, new Path(output));
            //MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass)
            //MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass)
            MultipleOutputs.addNamedOutput(job, "foo", TextOutputFormat.class, NullWritable.class, Text.class);
            MultipleOutputs.addNamedOutput(job, "bar", TextOutputFormat.class, Text.class,NullWritable.class);
            MultipleOutputs.addNamedOutput(job, "foobar", TextOutputFormat.class, Text.class, NullWritable.class);

And then my reducer has the following code:
mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
mos.write("bar", key,NullWritable.get());
mos.write("foobar", key,NullWritable.get());

But in the output, I see:

output/foo-r-0001
output/foo-r-0002
output/foobar-r-0001
output/bar-r-0001


But what I am trying is :

output/foo/part-r-0001
output/foo/part-r-0002
output/bar/part-r-0001

output/foobar/part-r-0001

How do I do this? Thanks

like image 655
frazman Avatar asked Oct 11 '13 22:10

frazman


1 Answers

If you mean this MultipleOutputs, the simplest way would be to do one of the following from you reducer --

  1. Using named output with a base output path. See this function.
  2. Without named output and using only a base output path, See this function

In your case, it's point 1, so, please change the following --

mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
mos.write("bar", key,NullWritable.get());
mos.write("foobar", key,NullWritable.get());

to,

mos.write("foo",NullWritable.get(),new Text(jsn.toString()), "foo/part");
mos.write("bar", key,NullWritable.get(), "bar/part");
mos.write("foobar", key,NullWritable.get(), "foobar/part");

Where, "foo/part", "bar/part" and "foobar/part" corresponds to the baseOutputPath. Hence, directories foo, bar and foobar would be created and inside that part-r-xxxxx files.

You might also try point 2 above, which actually don't need any named output.

Please get back to me for further clarification, if needed.

like image 65
SSaikia_JtheRocker Avatar answered Sep 20 '22 03:09

SSaikia_JtheRocker