Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep existing columns when expanding data.table object by groups

Tags:

r

data.table

I have a data.table object which I expand by id based on the date variables start.time and stop.time. The code listed below works - my question is on how to make my code easier to work with.

How to avoid manually listing the existing variables of the data.table object in the statement, i.e. "start.time" and "stop.time" in test <- DT[, list(start.time, stop.time, time=seq(start.time, stop.time, by="1 month")), by=id]. Manually adding the variables is annoying if you assume that I have not only two variables but several dozens. The aim is to include all columns that exist in the original data.table object.

I tried to add .SD and setdiff(names(DT), "id") instead of the variable names, but this threw errors.

DT <- data.table(id=sample(1000), 
                 start.time=sample(seq(as.Date("01-01-2010", format="%d-%m-%Y"), as.Date("31-12-2010", format="%d-%m-%Y"), by="day"), 1000, replace=TRUE),
                 stop.time=sample(seq(as.Date("01-01-2011", format="%d-%m-%Y"), as.Date("31-12-2011", format="%d-%m-%Y"), by="day"), 1000, replace=TRUE))

# > DT
#        id start.time  stop.time
#    1: 105 2010-12-16 2011-07-30
#    2: 837 2010-01-15 2011-01-19
#    3: 155 2010-11-30 2011-10-07
#    4: 125 2010-03-31 2011-04-19
#    5: 192 2010-07-13 2011-03-27
#   ---                          
#  996: 749 2010-07-19 2011-02-18
#  997: 660 2010-06-28 2011-10-25
#  998: 808 2010-03-22 2011-12-04
#  999: 291 2010-09-03 2011-06-08
# 1000: 741 2010-01-25 2011-01-23


test <- DT[, list(start.time, stop.time, time=seq(start.time, stop.time, by="1 month")), by=id]

# > test
#         id start.time  stop.time       time
#     1: 903 2010-11-08 2011-10-22 2010-11-08
#     2: 903 2010-11-08 2011-10-22 2010-12-08
#     3: 903 2010-11-08 2011-10-22 2011-01-08
#     4: 903 2010-11-08 2011-10-22 2011-02-08
#     5: 903 2010-11-08 2011-10-22 2011-03-08
#    ---                                     
# 12346: 229 2010-09-11 2011-10-02 2011-05-11
# 12347: 229 2010-09-11 2011-10-02 2011-06-11
# 12348: 229 2010-09-11 2011-10-02 2011-07-11
# 12349: 229 2010-09-11 2011-10-02 2011-08-11
# 12350: 229 2010-09-11 2011-10-02 2011-09-1
like image 545
majom Avatar asked Sep 30 '22 20:09

majom


1 Answers

You can use .SD like this:

DT[, c(.SD, list(time=seq(start.time, stop.time, by="1 month"))), by=id]
like image 186
Roland Avatar answered Oct 03 '22 14:10

Roland