Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write pojo's to parquet file using reflection

HI Looking for APIs to write parquest with Pojos that I have. I was able to generate avro schema using reflection and then create parquet schema using AvroSchemaConverter. Also i am not able to find a way to convert Pojos to GenericRecords (avro) else I could have been able to use AvroParquetWriter to write out the Pojos into parquet files. Any suggestions ?

like image 731
Urvishsinh Mahida Avatar asked Nov 10 '22 00:11

Urvishsinh Mahida


1 Answers

If you want to go through avro you have two options:

1) Let avro generate your pojos (see the tutorial here). The generated pojos extend SpecificRecord which can then be used with AvroParquetWriter.

2) Write the conversion from your pojo to GenericRecord yourself. You can do this either manually or a more generic solution would be to use reflection. However, I encountered difficulties with this approach when I tried to read the data. Based on the supplied schema avro found the pojo in the classpath and tried to instantiate a SpecificRecord instead of GenericRecord. Because of this reason I went with option 1.

Parquet also supports now writing pojo directly. Here is the pull request on parquet github page. However, I think this is not part of an official release yet. In another words, I did not find this code in maven.

like image 123
karolovbrat Avatar answered Nov 15 '22 07:11

karolovbrat