Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given start date and end date, reshape/expand data for each day between (each day on a row) [duplicate]

Tags:

date

r

diff

I have spent much time to get each difference days in R:

start <- as.Date(c("2013-02-26", "2013-03-26","2013-04-01","2013-04-26","2013-05-26"))
end <- as.Date(c("2013-03-25","2013-03-31","2013-04-25","2013-05-25","2013-06-25"))
per_cost <- c(3451380,3767052,3726900,4076868,3575311)
x    <- data.frame(START_DAY=start, END_DAY=end, PER_COST=per_cost) 
x$DIF_DAYS<- x$END_DAY-x$START_DAY

Then, I got this:

    START_DAY    END_DAY PER_COST DIF_DAYS
1 2013-02-26 2013-03-25  3451380  27 days
2 2013-03-26 2013-03-31  3767052   5 days
3 2013-04-01 2013-04-25  3726900  24 days
4 2013-04-26 2013-05-25  4076868  29 days
5 2013-05-26 2013-06-25  3575311  30 days

I would like to get this output:

DATE        PER_COST
2013-02-26 3451380
2013-02-27 3451380
2013-02-28 3451380
2013-02-29 3451380
...
2013-03-25 3451380
2013-03-26 3767052
2013-03-27 3767052
2013-03-28 3767052

How to do so?

like image 748
K.I.N Avatar asked Mar 17 '23 08:03

K.I.N


2 Answers

Using data.table

library(data.table)
setDT(x)[, list(DATE=seq(START_DAY, END_DAY, by = 'day')), PER_COST]
#    PER_COST       DATE
# 1:  3451380 2013-02-26
# 2:  3451380 2013-02-27
# 3:  3451380 2013-02-28
# 4:  3451380 2013-03-01
# 5:  3451380 2013-03-02
#---                    
#116:  3575311 2013-06-21
#117:  3575311 2013-06-22
#118:  3575311 2013-06-23
#119:  3575311 2013-06-24
#120:  3575311 2013-06-25

If there are duplicate PER_COST, then it may be better to use 1:nrow(x) as the grouping variable

setDT(x)[, list(DATE=seq(START_DAY, END_DAY, by = 'day'), 
      PER_COST=rep(PER_COST, END_DAY-START_DAY+1)), 1:nrow(x)]

Update

Using dplyr

library(dplyr)
  x %>% 
    rowwise() %>% 
    do(data.frame(DATE=seq(.$START_DAY, .$END_DAY, by='day'),
       PER_COST= rep(.$PER_COST, .$END_DAY-.$START_DAY+1)))
like image 182
akrun Avatar answered Mar 18 '23 22:03

akrun


You could do something like

do.call(rbind, apply(df, 1, function(x) 
  data.frame(DATE = seq.Date(from = as.Date(x[1]), to = as.Date(x[2]), by = "day"), 
             PER_COST = x[3], row.names = NULL))
)
# 1.1  2013-02-26  3451380
# 1.2  2013-02-27  3451380
# 1.3  2013-02-28  3451380
# 1.4  2013-03-01  3451380
# 1.5  2013-03-02  3451380
# 1.6  2013-03-03  3451380
# 1.7  2013-03-04  3451380
like image 27
lukeA Avatar answered Mar 18 '23 22:03

lukeA