What is the simple way to write Parquet Format to HDFS (using Java API) by directly creating Parquet Schema of a Pojo, without using avro and MR? The samples I found were outdated and uses deprecated methods also uses one of Avro, spark or MR.

Effectively, there is not a lot of sample available for reading/writing Apache parquet files without the help of an external framework. The core parquet library is parquet-column where you can find some test files reading/writing directly : https://github.com/apache/parquet-mr/blob/master/parquet-column/src/test/java/org/apache/parquet/io/TestColumnIO.java You then just need to use the same functionality with an HDFS file. You can follow this SOW question for this : Accessing files in HDFS using Java UPDATED : to respond to the deprecated parts of the API : AvroWriteSupport should be replaced by AvroParquetWriter and I check ParquetWriter it's not deprecated and can be used safely. Regards, Loïc

Write Parquet format to HDFS using Java API with out using Avro and MR

1 Answers

Effectively, there is not a lot of sample available for reading/writing Apache parquet files without the help of an external framework.

The core parquet library is parquet-column where you can find some test files reading/writing directly : https://github.com/apache/parquet-mr/blob/master/parquet-column/src/test/java/org/apache/parquet/io/TestColumnIO.java

You then just need to use the same functionality with an HDFS file. You can follow this SOW question for this : Accessing files in HDFS using Java

UPDATED : to respond to the deprecated parts of the API : AvroWriteSupport should be replaced by AvroParquetWriter and I check ParquetWriter it's not deprecated and can be used safely.

Regards,

Loïc

answered Oct 25 '22 20:10

loicmathieu

Related questions
                            
                                Integration Test Coverage for a webapp deployed on tomcat?
                            
                                intelliJ code-inspection standalone using Java
                            
                                Android -Retained headless fragment
                            
                                java.lang.ClassCastException: com.google.gson.internal.LinkedTreeMap cannot be cast to model
                            
                                SQL Server Nvarchar and Java prepared statement
                            
                                In Java Priority Queue implementation remove at method, why it does a sift up after a sift down?
                            
                                Currency Symbol given by DecimalFormat looks invalid
                            
                                Eclipse shows warning for checkstyle SuppresWarnings that exists
                            
                                How does google-services.json replace default_web_client_id?
                            
                                How to change my method to a generic method?
                            
                                Connecting to remote Windows machine with JSch
                            
                                org.apache.thrift.transport.TTransportException: Read a negative frame size (-2080374784)!
                            
                                JUnit rollback transaction with @Async method
                            
                                An efficient way to get and store the shortest paths
                            
                                Java : Merging multiple log files by date
                            
                                java - can we have a weak thread?
                            
                                Handling errors in Volley (with futures)?
                            
                                Generate random numbers in a specific range with a standard deviation?
                            
                                How can I find complete list of platform-dependent traps in JDK like timezone, encoding, line endings etc.?
                            
                                kafka java producer stuck in producing message

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Write Parquet format to HDFS using Java API with out using Avro and MR

Tags:

java

hadoop

hdfs

parquet

Krishas

People also ask

1 Answers

loicmathieu

Recent Activity

Donate For Us