How do you extract first the schema and then the data from an avro file in Java? Identical to this question except in java.
I've seen examples of how to get the schema from an avsc file but not an avro file. What direction should I be looking in?
Schema schema = new Schema.Parser().parse( new File("/home/Hadoop/Avro/schema/emp.avsc") );
For those using the C# Avro Apache library, the utility function DataFileReader<GenericRecord>. OpenReader(filename); can be used to instantiate the dataFileReader . Once instantiated, it the dataFileReader is used just like in Java.
When Avro data is stored in a file, its schema is stored with it, so that files may be processed later by any program. If the program reading the data expects a different schema this can be easily resolved, since both schemas are present.
The avro file needs to be confirmed into a file type that Boomi is able to read and write. In this example we are using json as that file type. The scripts below have been successful ran on local atoms but has not been tested on cloud atoms. You will also need to install the Apache Avro jar files.
If you want know the schema of a Avro file without having to generate the corresponding classes or care about which class the file belongs to, you can use the GenericDatumReader
:
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(); DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("file.avro"), datumReader); Schema schema = dataFileReader.getSchema(); System.out.println(schema);
And then you can read the data inside the file:
GenericRecord record = null; while (dataFileReader.hasNext()) { record = dataFileReader.next(record); System.out.println(record); }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With