Is it possible to read parquet files from Scala without using Apache Spark?
I found a project which allows us to read and write avro files using plain scala.
https://github.com/sksamuel/avro4s
However I can't find a way to read and write parquet files using plain scala program without using Spark?
If there is a pre-existing file association, right click on any . parquet file, select Open With ... Choose Another App and select parquetfile .
It is well-known that columnar storage saves both time and space when it comes to big data processing. Parquet, for example, is shown to boost Spark SQL performance by 10X on average compared to using text, thanks to low-level reader filters, efficient execution plans, and in Spark 1.6. 0, improved scan throughput!
It's straightforward enough to do using the parquet-mr project, which is the project Alexey Raga is referring to in his answer.
Some sample code
val reader = AvroParquetReader.builder[GenericRecord](path).build().asInstanceOf[ParquetReader[GenericRecord]] // iter is of type Iterator[GenericRecord] val iter = Iterator.continually(reader.read).takeWhile(_ != null) // if you want a list then... val list = iter.toList
This will return you a standard Avro GenericRecord
s, but if you want to turn that into a scala case class, then you can use my Avro4s library as you linked to in your question, to do the marshalling for you. Assuming you are using version 1.30 or higher then:
case class Bibble(name: String, location: String) val format = RecordFormat[Bibble] // then for a given record val bibble = format.from(record)
We can obviously combine that with the original iterator in one step:
val reader = AvroParquetReader.builder[GenericRecord](path).build().asInstanceOf[ParquetReader[GenericRecord]] val format = RecordFormat[Bibble] // iter is now an Iterator[Bibble] val iter = Iterator.continually(reader.read).takeWhile(_ != null).map(format.from) // and list is now a List[Bibble] val list = iter.toList
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With