Consider a data frame of the form
idnum start end 1993.1 17 1993-01-01 1993-12-31 1993.2 17 1993-01-01 1993-12-31 1993.3 17 1993-01-01 1993-12-31
with start
and end
being of type Date
$ idnum : int 17 17 17 17 27 27 $ start : Date, format: "1993-01-01" "1993-01-01" "1993-01-01" "1993-01-01" ... $ end : Date, format: "1993-12-31" "1993-12-31" "1993-12-31" "1993-12-31" ...
I would like to create a new dataframe, that has instead monthly observations for every row, for every month in between start
and end
(including the boundaries):
Desired Output
idnum month 17 1993-01-01 17 1993-02-01 17 1993-03-01 ... 17 1993-11-01 17 1993-12-01
I'm not sure what format month
should have, I will at some point want to group by idnum
, month
for regressions on the rest of the data set.
So far, for every single row, seq(from=test[1,'start'], to=test[1, 'end'], by='1 month')
gives me the right sequence - but as soon as I try to apply that to the whole data frame, it will not work:
> foo <- apply(test, 1, function(x) seq(x['start'], to=x['end'], by='1 month')) Error in to - from : non-numeric argument to binary operator
Using data.table
:
require(data.table) ## 1.9.2+ setDT(df)[ , list(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)] # you may use dot notation as a shorthand alias of list in j: setDT(df)[ , .(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)]
setDT
converts df
to a data.table
. Then for each row, by = 1:nrow(df)
, we create idnum
and month
as required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With