I want to get started with using Avro with Map Reduce. Can Someone suggest a good tutorial / example to get started with. I couldnt find much through the internet search.
You can use either ConvertRecord or ConvertAvroToJSON to convert your incoming Avro data to JSON. If the incoming Avro files do not have a schema embedded in them, then you will have to provide it, either to an AvroReader (for ConvertRecord) or the "Avro schema" property (for ConvertAvroToJSON).
The Avro format is the ideal candidate for storing data in a data lake landing zone because: 1. Data from the landing zone is usually read as a whole for further processing by downstream systems (the row-based format is more efficient in this case).
ORC, Parquet, and Avro are also machine-readable binary formats, which is to say that the files look like gibberish to humans. If you need a human-readable format like JSON or XML, then you should probably re-consider why you're using Hadoop in the first place.
I recently did a project that was heavily based on Avro data and not having used this data format before, I had to start from scratch. You are right in that it is rather hard to get much help from online sources when getting started with Avro. The material that I would recommend to you is:
Finally, my last suggestion to you is to use Avro 1.4.1 with Hadoop 0.20.2 and ONLY that combination. I had some major issues getting my code to run using Hadoop 0.21 and more recent Avro versions.
Other links:
The main problem I see with documentation (little that does exist) is that it focuses on very laborious "generic" approach; which seems odd because it combines worst of both world -- you must still provide full schema for data, but get no benefit from static types or such. The automatic code-generation is more convenient, but less well covered.
https://github.com/apache/avro/blob/trunk/lang/java/mapred avro source code do have examples. e.g. TestReflectJob help me to write map-reduce job using my pre-defined domain objects
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With