Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a parquet file in R without using spark packages?

Tags:

r

parquet

I could find many answers online by using sparklyr or using different spark packages which actually requires spinning up a spark cluster which is an overhead. In python I could find a way to do this using "pandas.read_parquet" or Apache arrow in python - I am looking for something similar to this.

like image 812
Gerg Avatar asked May 10 '18 19:05

Gerg


People also ask

Can you read Parquet files in R?

'Parquet' is a columnar storage file format. This function enables you to read Parquet files into R.

How do I extract data from a Parquet file?

With the query results stored in a DataFrame, we can use petl to extract, transform, and load the Parquet data. In this example, we extract Parquet data, sort the data by the Column1 column, and load the data into a CSV file.

Why does Parquet work better than Spark?

Parquet has higher execution speed compared to other standard file formats like Avro,JSON etc and it also consumes less disk space in compare to AVRO and JSON.


1 Answers

You can simply use the arrow package:

install.packages("arrow")
library(arrow)
read_parquet("myfile.parquet")
like image 111
fc9.30 Avatar answered Oct 17 '22 14:10

fc9.30