Writing Parquet files with Scala for spark without spark as dependency

Question

I want to write my collection to .parquet file, so that it can be later read using Spark.

So far I am creating file with this code:

package com.contrib.parquet

import org.apache.avro.SchemaBuilder
import org.apache.avro.reflect.ReflectData
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.parquet.avro.AvroParquetWriter
import org.apache.parquet.hadoop.{ParquetFileWriter, ParquetWriter}
import org.apache.parquet.hadoop.metadata.CompressionCodecName

object ParquetWriter {
  def main(args: Array[String]): Unit = {

    val schema = SchemaBuilder
      .record("Record")
      .fields()
      .requiredString("name")
      .requiredInt("id")
      .endRecord()

    val writer: ParquetWriter[Record] = AvroParquetWriter
      .builder(new Path("/tmp/parquetResult"))
      .withConf(new Configuration)
      .withDataModel(ReflectData.get)
      .withCompressionCodec(CompressionCodecName.SNAPPY)
      .withSchema(schema)
      .withWriteMode(ParquetFileWriter.Mode.OVERWRITE)
      .build()

    Seq(Record("nameOne", 1), Record("nameTwo", 2)).foreach(writer.write)
    writer.close()
  }

  case class Record(name: String, id: Int)
}

which creates a parquet file successfully. When I try to read that file using spark I get java.lang.NoSuchMethodError: org.apache.parquet.column.values.ValuesReader.initFromPage error.

Spark code:

val master = "local[4]"
val sparkCtx = SparkSession
  .builder()
  .appName("ParquetReader")
  .master(master)
  .getOrCreate()

val schema = Encoders.product[Record].schema
val df = sparkCtx.read.parquet("/tmp/parquetResult")
df.show(100, false)

How do I write Parquet files so that they can be read using Spark? I don't want to have local Spark app just to write this file.

Aivaras · Accepted Answer

We ended up with using open source library: https://github.com/mjakubowski84/parquet4s

Writing Parquet files with Scala for spark without spark as dependency

Tags:

scala

apache-spark

parquet

Aivaras

1 Answers

Aivaras

Recent Activity

Donate For Us

Writing Parquet files with Scala for spark without spark as dependency

Tags:

scala

apache-spark

parquet

Aivaras

1 Answers

Aivaras

Related questions

Recent Activity

Donate For Us