Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a pickle file (PANDAS Python Data Frame) in R

Is there an easy way to read pickle files (.pkl) from Pandas Dataframe into R?

One possibility is to export to CSV and have R read the CSV but that seems really cumbersome for me because my dataframes are rather large. Is there an easier way to do so?

Thanks!

like image 984
Vincent Avatar asked Feb 01 '16 00:02

Vincent


People also ask

Can R read python pickle file?

You could load the pickle in python and then export it to R via the python package rpy2 (or similar). Once you've done so, your data will exist in an R session linked to python.

How do I read a Pandas pickle file?

Reading Pickle Files Using Pandas The most basic way to read a pickle file is to use the read_pickle() function. This function takes the name of the pickle file as an argument and returns a pandas DataFrame. One can read pickle files in Python using the read_pickle() function.

How do I view the contents of a pickle file?

load you should be reading the first object serialized into the file (not the last one as you've written). After unserializing the first object, the file-pointer is at the beggining of the next object - if you simply call pickle. load again, it will read that next object - do that until the end of the file.

How do I open a python pickle?

Python Pickle load To retrieve pickled data, the steps are quite simple. You have to use pickle. load() function to do that. The primary argument of pickle load function is the file object that you get by opening the file in read-binary (rb) mode.


2 Answers

Reticulate was quite easy and super smooth as suggested by russellpierce in the comments.

install.packages('reticulate') 

After which I created a Python script like this from examples given in their documentation.

Python file:

import pandas as pd  def read_pickle_file(file):     pickle_data = pd.read_pickle(file)     return pickle_data 

And then my R file looked like:

require("reticulate")  source_python("pickle_reader.py") pickle_data <- read_pickle_file("C:/tsa/dataset.pickle") 

This gave me all my data in R stored earlier in pickle format.

You can also do this all in-line in R without leaving your R editor (provided your system python can reach pandas)... e.g.

library(reticulate) pd <- import("pandas") pickle_data <- pd$read_pickle("dataset.pickle") 
like image 74
Ankur Sinha Avatar answered Oct 15 '22 14:10

Ankur Sinha


Edit: If you can install and use the {reticulate} package, then this answer is probably outdated. See the other answers below for an easier path.

You could load the pickle in python and then export it to R via the python package rpy2 (or similar). Once you've done so, your data will exist in an R session linked to python. I suspect that what you'd want to do next would be to use that session to call R and saveRDS to a file or RAM disk. Then in RStudio you can read that file back in. Look at the R packages rJython and rPython for ways in which you could trigger the python commands from R.

Alternatively, you could write a simple python script to load your data in Python (probably using one of the R packages noted above) and write a formatted data stream to stdout. Then that entire system call to the script (including the argument that specifies your pickle) can use used as an argument to fread in the R package data.table. Alternatively, if you wanted to keep to standard functions, you could use combination of system(..., intern=TRUE) and read.table.

As usual, there are /many/ ways to skin this particular cat. The basic steps are:

  1. Load the data in python
  2. Express the data to R (e.g., exporting the object via rpy2 or writing formatted text to stdout with R ready to receive it on the other end)
  3. Serialize the expressed data in R to an internal data representation (e.g., exporting the object via rpy2 or fread)
  4. (optional) Make the data in that session of R accessible to another R session (i.e., the step to close the loop with rpy2, or if you've been using fread then you're already done).
like image 31
russellpierce Avatar answered Oct 15 '22 14:10

russellpierce