I'd like to process Apache Parquet files (in my case, generated in Spark) in the R programming language. Is an R reader available? Or is work being done on one? If not, what would be the most expedient way to get there? Note: There are Java and C++ bindings: https://github.com/apache/parquet-mr

If you're using Spark then this is now relatively simple with the release of Spark 1.4 see sample code below that uses the SparkR package that is now part of the Apache Spark core framework. <pre class="prettyprint"><code># install the SparkR package devtools::install_github('apache/spark', ref='master', subdir='R/pkg') # load the SparkR package library('SparkR') # initialize sparkContext which starts a new Spark session sc <- sparkR.init(master="local") # initialize sqlContext sq <- sparkRSQL.init(sc) # load parquet file into a Spark data frame and coerce into R data frame df <- collect(parquetFile(sq, "/path/to/filename")) # terminate Spark session sparkR.stop() </code></pre> An expanded example is shown @ https://gist.github.com/andyjudson/6aeff07bbe7e65edc665 I'm not aware of any other package that you could use if you weren't using Spark.

How do I read a Parquet in R and convert it to an R DataFrame?

1 Answers

If you're using Spark then this is now relatively simple with the release of Spark 1.4 see sample code below that uses the SparkR package that is now part of the Apache Spark core framework.

# install the SparkR package devtools::install_github('apache/spark', ref='master', subdir='R/pkg')  # load the SparkR package library('SparkR')  # initialize sparkContext which starts a new Spark session sc <- sparkR.init(master="local")  # initialize sqlContext sq <- sparkRSQL.init(sc)  # load parquet file into a Spark data frame and coerce into R data frame df <- collect(parquetFile(sq, "/path/to/filename"))  # terminate Spark session sparkR.stop()

An expanded example is shown @ https://gist.github.com/andyjudson/6aeff07bbe7e65edc665

I'm not aware of any other package that you could use if you weren't using Spark.

164

answered Sep 27 '22 21:09

Andy Judson

Related questions
                            
                                How to create example data set from private data (replacing variable names and levels with uninformative place holders)?
                            
                                Generate a set of random unique integers from an interval
                            
                                Convert data from long format to wide format with multiple measure columns
                            
                                How do I obtain the machine epsilon in R?
                            
                                data.table "key indices" or "group counter"
                            
                                R find time when a file was created
                            
                                How to melt and cast dataframes using dplyr?
                            
                                How to tell lapply to ignore an error and process the next thing in the list?
                            
                                How to get help in R?
                            
                                How to call a function using the character string of the function name in R?
                            
                                Getting frequency values from histogram in R
                            
                                How to remove rows with inf from a dataframe in R
                            
                                Extracting text data from PDF files
                            
                                Examples of the perils of globals in R and Stata
                            
                                Pretty ticks for log normal scale using ggplot2 (dynamic not manual)
                            
                                Vectorized IF statement in R?
                            
                                Hiding NA's when printing a dataframe in knitr
                            
                                Creating a sequential list of letters with R
                            
                                Calling R Function from C++
                            
                                Adding a company Logo to ShinyDashboard header

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I read a Parquet in R and convert it to an R DataFrame?

Tags:

r

apache-spark

parquet

sparkr

metasim

People also ask

1 Answers

Andy Judson

Recent Activity

Donate For Us