How to append data to an existing parquet file

Tags:

I'm using the following code to create ParquetWriter and to write records to it.

ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);

final GenericRecord record = new GenericData.Record(avroSchema);

parquetWriter.write(record);

But it only allows to create new files(at the specfied path). Is there a way to append data to an existing parquet file (at path)? Caching parquetWriter is not feasible in my case.

817

asked Aug 30 '16 18:08

Krishas

1 Answers

There is a Spark API SaveMode called append: https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/SaveMode.html which I believe solves your problem.

Example of use:

df.write.mode('append').parquet('parquet_data_file')

185

answered Oct 07 '22 17:10

bluszcz

Related questions
                            
                                Using Actors instead of `synchronized`
                            
                                How can I use the same regular expression in different programming languages?
                            
                                Regular expression doesn't match empty string in multiline mode (Java)
                            
                                "The matching wildcard is strict, but no declaration can be found for element 'http'" Error
                            
                                Efficient loop through Java List
                            
                                AnnotationProcessor using multiple source-files to create one file
                            
                                JDK implementation of AbstractList::equals() does not check for list size equality first
                            
                                App not correctly configured to use google play game services
                            
                                When is contextDestroyed called?
                            
                                Why is close() method of the resource called before catch in a try-with-resources construct in Java?
                            
                                Why can't I use generics in an inner interface?
                            
                                Hibernate fails to load JPA 2.1 Converter when loaded with spring-boot and spring-data-jpa
                            
                                Do I need to @Nonnull again at the implementation?
                            
                                Geometry from vividsolutions JTS fails when creating JSON
                            
                                Obfuscate private fields using ProGuard
                            
                                Why can an anonymous class access non-final class member of the enclosing class
                            
                                Android Studio using 100% CPU on an i7 processor for project Rebuild
                            
                                Proguard obfuscate only /WEB-INF/classes/**/*.class files in my war
                            
                                The best way to intercept a WebView request in Android
                            
                                Spring-Boot RestClientTest not correctly auto-configuring MockRestServiceServer due to unbound RestTemplate

How to append data to an existing parquet file

Tags:

java

hadoop

parquet

Krishas

People also ask

1 Answers

bluszcz

Recent Activity

Donate For Us