Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expand rows by date range using start and end date

Tags:

r

Consider a data frame of the form

       idnum      start        end 1993.1    17 1993-01-01 1993-12-31 1993.2    17 1993-01-01 1993-12-31 1993.3    17 1993-01-01 1993-12-31 

with start and end being of type Date

 $ idnum : int  17 17 17 17 27 27  $ start : Date, format: "1993-01-01" "1993-01-01" "1993-01-01" "1993-01-01" ...  $ end   : Date, format: "1993-12-31" "1993-12-31" "1993-12-31" "1993-12-31" ... 

I would like to create a new dataframe, that has instead monthly observations for every row, for every month in between start and end (including the boundaries):

Desired Output

idnum       month    17  1993-01-01    17  1993-02-01    17  1993-03-01 ...    17  1993-11-01    17  1993-12-01 

I'm not sure what format month should have, I will at some point want to group by idnum, month for regressions on the rest of the data set.

So far, for every single row, seq(from=test[1,'start'], to=test[1, 'end'], by='1 month') gives me the right sequence - but as soon as I try to apply that to the whole data frame, it will not work:

> foo <- apply(test, 1, function(x) seq(x['start'], to=x['end'], by='1 month')) Error in to - from : non-numeric argument to binary operator 
like image 491
FooBar Avatar asked Jul 17 '14 12:07

FooBar


1 Answers

Using data.table:

require(data.table) ## 1.9.2+ setDT(df)[ , list(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)]  # you may use dot notation as a shorthand alias of list in j: setDT(df)[ , .(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)] 

setDT converts df to a data.table. Then for each row, by = 1:nrow(df), we create idnum and month as required.

like image 182
Arun Avatar answered Oct 04 '22 11:10

Arun