Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Don't want original data.table to be modified when passed to a function

I am a fan of data.table, as of writing re-usable functions for all current and future needs.

Here's a challenge I run into while working on the answer to this problem: Best way to plot automatically all data.table columns using ggplot2

We pass data.table to a function for plotting and then the original data.table gets modified, even though we made a copy of it to prevent that.

Here's a simple code to illustrate:

plotYofX <- function(.dt,x,y) {
  dt <- .dt
  dt[, (c(x,y)) := lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
  ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y])
}


> dtDiamonds <- data.table(ggplot2::diamonds[2:5,1:3]); 
> dtDiamonds
   carat     cut color
   <num>   <ord> <ord>
1:  0.21 Premium     E
2:  0.23    Good     E
3:  0.29 Premium     I
4:  0.31    Good     J

> plotYofX(dtDiamonds,1,2); 
> dtDiamonds
    carat   cut color
    <num> <num> <ord>
1:  0.21     4     E
2:  0.23     2     E
3:  0.29     4     I
4:  0.31     2     J

I've seen many postings on various issues related to using := inside the function, but could not find any to help me to resolve this seemingly very easy issue. (Of course, I don't what to convert it back to data.frame to achieve the desired outcome)

like image 293
IVIM Avatar asked Oct 17 '22 10:10

IVIM


2 Answers

Thanks to comments/answers above: this would be the easiest solution to this particular function (i.e. no need to introduce any additional .dt variable at all);

plotYofX <- function(dt,x,y) {
  dt[,  lapply(.SD, function(x) {as.numeric(x)}), .SDcols = c(x,y)]
  ggplot(dt) + geom_step(aes(x=get(names(dt)[x]), y=get(names(dt)[y]))) + labs(x=names(dt)[x], y=names(dt)[y]) 

}

However, it was also important to learn that when working with data.table, one should be particularly careful in not making any "copies" of it with regular <- sign, but use copy(dt) instead - so that not corrupt the original data.table!
This is further discussed in detail here: Understanding exactly when a data.table is a reference to (vs a copy of) another data.table

like image 32
IVIM Avatar answered Oct 21 '22 01:10

IVIM


Try:

dt <- copy(.dt)

It should work well.

like image 107
filius_arator Avatar answered Oct 21 '22 03:10

filius_arator