Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to read a parquet file, in a standalone java code? [closed]

Tags:

java

parquet

the parquet docs from cloudera shows examples of integration with pig/hive/impala. but in many cases I want to read the parquet file itself for debugging purposes.

is there a straightforward java reader api to read a parquet file ?

Thanks Yang

like image 208
teddy teddy Avatar asked Feb 19 '15 19:02

teddy teddy


People also ask

Can Java read Parquet file?

This project provides a library that reads Parquet files into Java objects.

How do I extract data from a Parquet file?

With the query results stored in a DataFrame, we can use petl to extract, transform, and load the Parquet data. In this example, we extract Parquet data, sort the data by the Column1 column, and load the data into a CSV file.

Can we open Parquet file?

parquet file formats. You can open a file by selecting from file picker, dragging on the app or double-clicking a . parquet file on disk. This utility is free forever and needs you feedback to continue improving.


2 Answers

Old method: (deprecated)

AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>(file);
GenericRecord nextRecord = reader.read();

New method:

ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(file).build();
GenericRecord nextRecord = reader.read();

I got this from here and have used this in my test cases successfully.

like image 57
rishiehari Avatar answered Oct 15 '22 23:10

rishiehari


You can use AvroParquetReader from parquet-avro library to read a parquet file as a set of AVRO GenericRecord objects.

like image 33
kostya Avatar answered Oct 15 '22 21:10

kostya