Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A mapreduce job with plain text input and avro output

Tags:

hadoop

avro

I'm pretty confused about using Avro with map reduce and can't find good tutorials to follow.

It seems that classes like AvroJob and AvroMapper are geared for problems when both input and output are Avro data files. What about when your input is just plain text?

Specifically:

My mapper takes LongWritable keys and Text values as input. It emits Text keys and MyAvroRecord values.

My reducer takes Text keys and an Iterator of MyAvroRecords as input, and emits Text keys and MyAvroRecord values.

How do I get an OutputFormat that would write these Text keys and MyAvroRecord values to file?

Cheers, Dave

like image 631
Dave Avatar asked Mar 15 '12 02:03

Dave


1 Answers

Ok, so I figured this out.

Rather than a mapper that outputs Text keys and MyAvroRecord values, I needed one that produced AvroKey keys and AvroValue values. That was able to feed its results straight onto an AvroReducer, and I could just use AvroJob.setOutputSchema() to handle the output (I didn't have to implement an OutputFormat at all).

like image 62
Dave Avatar answered Oct 17 '22 01:10

Dave