I am updating a set of functions that previously only accepted data.frame
objects to work with data.table
arguments.
I decided to implement the function using R's method dispatch so that the old code using data.frame
s will still work with the updated functions. In one of my functions, I take in a data.frame
as input, modify it, and then return the modified data.frame
. I created a data.table
implementation as well. For example:
# The functions
foo <- function(d) {
UseMethod("foo")
}
foo.data.frame <- function(d) {
<Do Something>
return(d)
}
foo.data.table <- function(d) {
<Do Something>
return(d)
}
I know that data.table
works by making changes without copying, and I implemented foo.data.table
while keeping that in mind. However, I return the data.table
object at the end of the function because I want my old scripts to work with the new data.table
objects. Will this make a copy of the data.table
? How can I check? According to the documentation, one has to be very explicit to create a copy of a data.table
, but I am not sure in this case.
The reason I want to return something when I do not have to with data.tables
:
My old scripts look like this
someData <- read.table(...)
...
someData <- foo(someData)
I want the scripts to be able to run with data.table
s by just changing the data ingest lines. In other words, I want the script to work by just changing someData <- read.table(...)
to someData <- fread(...)
.
Thanks to Arun for his answer in the comments. I will be using his example in his comments to answer the question.
One can check if copies are being made by using the tracemem
function to track an object in R. From the help file of the function, ?tracemem
, the description says:
This function marks an object so that a message is printed whenever the internal code copies the object. It is a major cause of hard-to-predict memory use in R.
For example:
# Using a data.frame
df <- data.frame(x=1:5, y=6:10)
tracemem(df)
## [1] "<0x32618220>"
df$y[2L] <- 11L
## tracemem[0x32618220 -> 0x32661a98]:
## tracemem[0x32661a98 -> 0x32661b08]: $<-.data.frame $<-
## tracemem[0x32661b08 -> 0x32661268]: $<-.data.frame $<-
df
## x y
## 1 1 6
## 2 2 11
## 3 3 8
## 4 4 9
## 5 5 10
# Using a data.table
dt <- data.table(x=1:5, y=6:10)
tracemem(dt)
## [1] "<0x5fdab40>"
set(dt, i=2L, j=2L, value=11L) # No memory output!
address(dt) # Verify the address in memory is the same
## [1] "0x5fdab40"
dt
## x y
## 1: 1 6
## 2: 2 11
## 3: 3 8
## 4: 4 9
## 5: 5 10
It appears that the data.frame
object is copied twice when changing one element in the data.frame
, while the data.table
is modified in place without making copies!
From my question, I can just track the data.table
or data.frame
object, d
, before passing it on to the function, foo
, to check if any copies were made.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With