I'm wondering how efficiently reticulate handles memory with python objects.
Suppose I have a 5GB pandas dataframe object called data_pandas, in reticulate::python and I'd like to make an analysis with R.
When I call the object from R like py$data_pandas, does it make a copy of this dataframe into R data.frame object internally (i.e. make another 5GB data.frame in R)?
And vice versa (calling R data.frame from python)?
I'm no expert, but it seems from the vignette on arrays that reticulate makes at least two copies of every python object: "R arrays are only copied to Python when they need to be, otherwise data are shared. Python arrays are always copied when moved into R arrays. This can sometimes lead to three copies of any one array in memory at any one time (at the moment this was written). Future versions will reduce that copy overhead to two." (From https://rstudio.github.io/reticulate/articles/arrays.html)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With