I have a list of data.tables that I need to cbind, however, I only need the last X columns.
My data is structured as follows:
DT.1 <- data.table(x=c(1,1), y = c("a","a"), v1 = c(1,2), v2 = c(3,4))
DT.2 <- data.table(x=c(1,1), y = c("a","a"), v3 = c(5,6))
DT.3 <- data.table(x=c(1,1), y = c("a","a"), v4 = c(7,8), v5 = c(9,10), v6 = c(11,12))
DT.list <- list(DT.1, DT.2, DT.3)
>DT.list
    [[1]]
   x y v1 v2
1: 1 a  1  3
2: 1 a  2  4
[[2]]
   x y v3
1: 1 a  5
2: 1 a  6
[[3]]
   x y v4 v5 v6
1: 1 a  7  9 11
2: 1 a  8 10 12
Columns x and y are the same for each of the data.tables but the amount of columns differs. The output should not include duplicate x, and y columns. It should look as follows:
   x y v1 v2 v3 v4 v5 v6
1: 1 a  1  3  5  7  9 11
2: 1 a  2  4  6  8 10 12
I want to avoid using a loop. I am able to bind the data.tables using do.call("cbind", DT.list) and then remove the duplicates manually, but is there a way where the duplicates aren't created in the first place? Also, efficiency is important since the lists can be long with large data.tables.
thanks
Here's another way:
Reduce(
  function(x,y){
    newcols = setdiff(names(y),names(x))
    x[,(newcols)] <- y[, ..newcols]
    x
  }, 
  DT.list,
  init = copy(DT.list[[1]][,c("x","y")])
)
#    x y v1 v2 v3 v4 v5 v6
# 1: 1 a  1  3  5  7  9 11
# 2: 1 a  2  4  6  8 10 12
This avoids modifying the list (as @bgoldst's <- NULL assignment does) or making copies of every element of the list (as, I think, the lapply approach does). I would probably do the <- NULL thing in most practical applications, though.
Here's how it could be done in one shot, using lapply() to remove columns x and y from second-and-subsequent data.tables before calling cbind():
do.call(cbind,c(DT.list[1],lapply(DT.list[2:length(DT.list)],`[`,j=-c(1,2))));
##    x y v1 v2 v3 v4 v5 v6
## 1: 1 a  1  3  5  7  9 11
## 2: 1 a  2  4  6  8 10 12
Another approach is to remove columns x and y from second-and-subsequent data.tables before doing a straight cbind(). I think there's nothing wrong with using a for loop for this:
for (i in seq_along(DT.list)[-1]) DT.list[[i]][,c('x','y')] <- NULL;
DT.list;
## [[1]]
##    x y v1 v2
## 1: 1 a  1  3
## 2: 1 a  2  4
##
## [[2]]
##    v3
## 1:  5
## 2:  6
##
## [[3]]
##    v4 v5 v6
## 1:  7  9 11
## 2:  8 10 12
##
do.call(cbind,DT.list);
##    x y v1 v2 v3 v4 v5 v6
## 1: 1 a  1  3  5  7  9 11
## 2: 1 a  2  4  6  8 10 12
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With