Is it possible to read and write Parquet using Java without a dependency on Hadoop and HDFS?

1 Answers

You can write parquet format out side hadoop cluster using java Parquet Client API.

Here is a sample code in java which writes parquet format to local disk.

import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.avro.AvroSchemaConverter;
import org.apache.parquet.avro.AvroWriteSupport;
import org.apache.parquet.hadoop.ParquetWriter;
import org.apache.parquet.hadoop.metadata.CompressionCodecName;
import org.apache.parquet.schema.MessageType;

public class Test {
    void test() throws IOException {
        final String schemaLocation = "/tmp/avro_format.json";
        final Schema avroSchema = new Schema.Parser().parse(new File(schemaLocation));
        final MessageType parquetSchema = new AvroSchemaConverter().convert(avroSchema);
        final WriteSupport<Pojo> writeSupport = new AvroWriteSupport(parquetSchema, avroSchema);
        final String parquetFile = "/tmp/parquet/data.parquet";
        final Path path = new Path(parquetFile);
        ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);
        final GenericRecord record = new GenericData.Record(avroSchema);
        record.put("id", 1);
        record.put("age", 10);
        record.put("name", "ABC");
        record.put("place", "BCD");
        parquetWriter.write(record);
        parquetWriter.close();
    }
}

avro_format.json,

{
   "type":"record",
   "name":"Pojo",
   "namespace":"com.xx.test",
   "fields":[
      {
         "name":"id",
         "type":[
            "int",
            "null"
         ]
      },
      {
         "name":"age",
         "type":[
            "int",
            "null"
         ]
      },
      {
         "name":"name",
         "type":[
            "string",
            "null"
         ]
      },
      {
         "name":"place",
         "type":[
            "string",
            "null"
         ]
      }
   ]
}

Hope this helps.

121

answered Oct 27 '22 11:10

Krishas

Related questions
                            
                                How to avoid adding of inject method for each view?
                            
                                Elasticsearch NoNodeAvailableException None of the configured nodes are available
                            
                                PowerMock access private static members
                            
                                Limit simultaneously logged on user devices count
                            
                                Retrofit for android @Multipart remove default headers
                            
                                How to Set Default Gateway,Ip Address and Subnet mask from Java?
                            
                                Facing FileNotFoundException while accessing JSON File in classpath using java in docker containers(SprintBootApplication)
                            
                                Using DateTimeFormatter with ObjectMapper
                            
                                Why do longs and doubles take up two entries in Java classes' constant pools?
                            
                                App engine consistent latency spikes under low load
                            
                                OpenCL: Distinguishing computation failure from TDR interrupt
                            
                                Spring Boot 1.4.2 @WebMvcTest returns status 404
                            
                                Is there a way to add aliases for Java's Charset names
                            
                                How to debug native react native libraries with Android Studio?
                            
                                How to get Database connection in Spring using JUnit?
                            
                                How to overwrite image in mongoDB gridfs?
                            
                                JPA Criteria Builder: How to pass ArrayList to Oracle function?
                            
                                What is the most secure way to authorize an user in Android?
                            
                                Point to point messaging in scalabale application?
                            
                                android - progress bar on splash screen

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to read and write Parquet using Java without a dependency on Hadoop and HDFS?

Tags:

java

hadoop

parquet

apache-drill

data-formats

Jesse

People also ask

1 Answers

Krishas

Recent Activity

Donate For Us