Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write Parquet format to HDFS using Java API with out using Avro and MR

What is the simple way to write Parquet Format to HDFS (using Java API) by directly creating Parquet Schema of a Pojo, without using avro and MR?

The samples I found were outdated and uses deprecated methods also uses one of Avro, spark or MR.

like image 325
Krishas Avatar asked Aug 29 '16 09:08

Krishas


People also ask

Does Parquet use Avro?

PARQUET. AVRO is a row-based storage format, whereas PARQUET is a columnar-based storage format. PARQUET is much better for analytical querying, i.e., reads and querying are much more efficient than writing. Writiing operations in AVRO are better than in PARQUET.

Which is faster Avro or Parquet?

Avro is fast in retrieval, Parquet is much faster. parquet stores data on disk in a hybrid manner. It does a horizontal partition of the data and stores each partition it in a columnar way.

What is parquet file format example?

What is Parquet? Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.


1 Answers

Effectively, there is not a lot of sample available for reading/writing Apache parquet files without the help of an external framework.

The core parquet library is parquet-column where you can find some test files reading/writing directly : https://github.com/apache/parquet-mr/blob/master/parquet-column/src/test/java/org/apache/parquet/io/TestColumnIO.java

You then just need to use the same functionality with an HDFS file. You can follow this SOW question for this : Accessing files in HDFS using Java

UPDATED : to respond to the deprecated parts of the API : AvroWriteSupport should be replaced by AvroParquetWriter and I check ParquetWriter it's not deprecated and can be used safely.

Regards,

Loïc

like image 81
loicmathieu Avatar answered Oct 25 '22 20:10

loicmathieu