Is there an easy way to read pickle files (.pkl) from Pandas Dataframe into R?
One possibility is to export to CSV and have R read the CSV but that seems really cumbersome for me because my dataframes are rather large. Is there an easier way to do so?
Thanks!
You could load the pickle in python and then export it to R via the python package rpy2 (or similar). Once you've done so, your data will exist in an R session linked to python.
Reading Pickle Files Using Pandas The most basic way to read a pickle file is to use the read_pickle() function. This function takes the name of the pickle file as an argument and returns a pandas DataFrame. One can read pickle files in Python using the read_pickle() function.
load you should be reading the first object serialized into the file (not the last one as you've written). After unserializing the first object, the file-pointer is at the beggining of the next object - if you simply call pickle. load again, it will read that next object - do that until the end of the file.
Python Pickle load To retrieve pickled data, the steps are quite simple. You have to use pickle. load() function to do that. The primary argument of pickle load function is the file object that you get by opening the file in read-binary (rb) mode.
Reticulate was quite easy and super smooth as suggested by russellpierce in the comments.
install.packages('reticulate')
After which I created a Python script like this from examples given in their documentation.
Python file:
import pandas as pd def read_pickle_file(file): pickle_data = pd.read_pickle(file) return pickle_data
And then my R file looked like:
require("reticulate") source_python("pickle_reader.py") pickle_data <- read_pickle_file("C:/tsa/dataset.pickle")
This gave me all my data in R stored earlier in pickle format.
You can also do this all in-line in R without leaving your R editor (provided your system python can reach pandas)... e.g.
library(reticulate) pd <- import("pandas") pickle_data <- pd$read_pickle("dataset.pickle")
Edit: If you can install and use the {reticulate} package, then this answer is probably outdated. See the other answers below for an easier path.
You could load the pickle in python and then export it to R via the python package rpy2
(or similar). Once you've done so, your data will exist in an R session linked to python. I suspect that what you'd want to do next would be to use that session to call R and saveRDS to a file or RAM disk. Then in RStudio you can read that file back in. Look at the R packages rJython
and rPython
for ways in which you could trigger the python commands from R.
Alternatively, you could write a simple python script to load your data in Python (probably using one of the R packages noted above) and write a formatted data stream to stdout. Then that entire system call to the script (including the argument that specifies your pickle) can use used as an argument to fread
in the R package data.table
. Alternatively, if you wanted to keep to standard functions, you could use combination of system(..., intern=TRUE)
and read.table
.
As usual, there are /many/ ways to skin this particular cat. The basic steps are:
fread
)fread
then you're already done).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With