Consider this simple example
import pandas as pd
mydata = pd.DataFrame({'mytime': [pd.to_datetime('2018-01-01 10:00:00.513'),
pd.to_datetime('2018-01-03 10:00:00.513')],
'myvariable': [1,2],
'mystring': ['hello', 'world']})
mydata
Out[7]:
mystring mytime myvariable
0 hello 2018-01-01 10:00:00.513 1
1 world 2018-01-03 10:00:00.513 2
I know I can write that dataframe to msgpack
using Pandas
:
mydata.to_msgpack('C://Users/john/Documents/mypack')
The problem is: how can I read that msgpack
file in R
?
Using RcppMsgPack
returns some puzzling output that is not a dataframe
/tibble
library(tidyverse)
library(RcppMsgPack)
df <- msgpack_read('C://Users/john/Documents/mypack', simplify = TRUE)
> df
$axes
$axes[[1]]
$axes[[1]]$typ
[1] "index"
$axes[[1]]$name
NULL
$axes[[1]]$klass
[1] "Index"
$axes[[1]]$compress
NULL
$axes[[1]]$data
[1] "mystring" "mytime" "myvariable"
$axes[[1]]$dtype
[1] "object"
$axes[[2]]
$axes[[2]]$typ
[1] "range_index"
$axes[[2]]$name
NULL
$axes[[2]]$klass
[1] "RangeIndex"
$axes[[2]]$start
[1] 0
$axes[[2]]$step
[1] 1
$axes[[2]]$stop
[1] 2
$typ
[1] "block_manager"
$blocks
$blocks[[1]]
$blocks[[1]]$shape
[1] 1 2
$blocks[[1]]$klass
[1] "IntBlock"
$blocks[[1]]$compress
NULL
$blocks[[1]]$values
[1] 01 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00
attr(,"EXT")
[1] 0
$blocks[[1]]$locs
$blocks[[1]]$locs$typ
[1] "ndarray"
$blocks[[1]]$locs$dtype
[1] "int64"
$blocks[[1]]$locs$compress
NULL
$blocks[[1]]$locs$ndim
[1] 1
$blocks[[1]]$locs$data
[1] 02 00 00 00 00 00 00 00
attr(,"EXT")
[1] 0
$blocks[[1]]$locs$shape
[1] 1
$blocks[[1]]$dtype
[1] "int64"
$blocks[[2]]
$blocks[[2]]$shape
[1] 1 2
$blocks[[2]]$klass
[1] "DatetimeBlock"
$blocks[[2]]$compress
NULL
$blocks[[2]]$values
[1] 40 02 0e 64 4d a7 05 15 40 02 ac 86 76 44 06 15
attr(,"EXT")
[1] 0
$blocks[[2]]$locs
$blocks[[2]]$locs$typ
[1] "ndarray"
$blocks[[2]]$locs$dtype
[1] "int64"
$blocks[[2]]$locs$compress
NULL
$blocks[[2]]$locs$ndim
[1] 1
$blocks[[2]]$locs$data
[1] 01 00 00 00 00 00 00 00
attr(,"EXT")
[1] 0
$blocks[[2]]$locs$shape
[1] 1
$blocks[[2]]$dtype
[1] "datetime64[ns]"
$blocks[[3]]
$blocks[[3]]$shape
[1] 1 2
$blocks[[3]]$klass
[1] "ObjectBlock"
$blocks[[3]]$compress
NULL
$blocks[[3]]$values
[1] "hello" "world"
$blocks[[3]]$locs
$blocks[[3]]$locs$typ
[1] "ndarray"
$blocks[[3]]$locs$dtype
[1] "int64"
$blocks[[3]]$locs$compress
NULL
$blocks[[3]]$locs$ndim
[1] 1
$blocks[[3]]$locs$data
[1] 00 00 00 00 00 00 00 00
attr(,"EXT")
[1] 0
$blocks[[3]]$locs$shape
[1] 1
$blocks[[3]]$dtype
[1] "object"
$klass
[1] "DataFrame"
What should I do?
Of course, going back from R to Python would also be nice. Thanks!
The pure Python implementation ( msgpack.fallback) is used for Python 2. use_bin_type=True by default. bytes are encoded in bin type in msgpack. If you are still using Python 2, you must use unicode for all string types. You can use use_bin_type=False to encode into old msgpack format.
Exchange objects between Python and R Any fairly standard data objects like values, lists and dataframes can be exchanged between Python and R. To use an R object called my_r_object in Python, you you call it using r.my_r_object. To use a Python object called my_python_object in R, you can call it using py$my_python_object.
MessagePack RPC for Python. This implementation uses Tornado framework as a backend. Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
JSON: Nice for writing human-readable data; VERY commonly used ( read & write) MessagePack ( Python package ): More compact representation ( read & write) In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python
How about you use library(reticulate)
in R:
library(reticulate)
pyData = py_run_string("import pandas as pd
mydata = pd.DataFrame({'mytime': [pd.to_datetime('2018-01-01 10:00:00.513'),
pd.to_datetime('2018-01-03 10:00:00.513')],
'myvariable': [1,2],
'mystring': ['hello', 'world']})")
It would yield the desired output:
pyData$mydata
mystring mytime myvariable
1 hello 2018-01-01 10:00:00 1
2 world 2018-01-03 10:00:00 2
You could save all the python code in a python file, e.g. mydata.py
and use the function py_run_file("mydata.py")
.
An overview of reticulate
can be found here: https://github.com/rstudio/reticulate.
Most interesting for you is probably the description of the type conversions:
Source: https://github.com/rstudio/reticulate#type-conversions.
Add-on question - From R to Python:
The type conversion also holds for "sending" data from R to Python, see here: https://rstudio.github.io/reticulate/articles/calling_python.html#sourcing-scripts.
py = py_run_string("def add(x, y):
return x + y")
py$add(5, 10)
15
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With