I'm pretty confused about using Avro with map reduce and can't find good tutorials to follow.
It seems that classes like AvroJob and AvroMapper are geared for problems when both input and output are Avro data files. What about when your input is just plain text?
Specifically:
My mapper takes LongWritable keys and Text values as input. It emits Text keys and MyAvroRecord values.
My reducer takes Text keys and an Iterator of MyAvroRecords as input, and emits Text keys and MyAvroRecord values.
How do I get an OutputFormat that would write these Text keys and MyAvroRecord values to file?
Cheers, Dave
Ok, so I figured this out.
Rather than a mapper that outputs Text keys and MyAvroRecord values, I needed one that produced AvroKey keys and AvroValue values. That was able to feed its results straight onto an AvroReducer, and I could just use AvroJob.setOutputSchema() to handle the output (I didn't have to implement an OutputFormat at all).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With