Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional sum with dates in column names

Want to calculate conditional sum based on specified dates in r. My sample df is

start_date = c("7/24/2017", "7/1/2017", "7/25/2017")
end_date   = c("7/27/2017", "7/4/2017", "7/28/2017")
`7/23/2017` = c(1,5,1)
`7/24/2017` = c(2,0,2)
`7/25/2017` = c(0,0,10)
`7/26/2017` = c(2,2,2)
`7/27/2017` = c(0,0,0)
df = data.frame(start_date,end_date,`7/23/2017`,`7/24/2017`,`7/25/2017`,`7/26/2017`,`7/27/2017`)

In Excel it looks like:

enter image description here

I want to perform calculations as specified in Column H which is a conditional sum of columns C through G based on the dates specified in columns A and B.

Apparently, Excel allows columns to be dates but not R.

like image 745
Jennifer Avatar asked Mar 14 '26 00:03

Jennifer


2 Answers

  #wide to long format
  dat <- reshape(df, direction="long", varying=list(names(df)[3:7]), v.names="Value", 
          idvar=c("start_date","end_date"), timevar="Date",
          times=seq(as.Date("2017/07/23"),as.Date("2017/07/27"), "day"))

  #convert from factor to date class
  dat$end_date <- as.Date(dat$end_date, format = "%m/%d/%Y")
  dat$start_date <- as.Date(dat$start_date, format = "%m/%d/%Y")

  library(dplyr)
  dat %>% group_by(start_date, end_date) %>% 
          mutate(mval = ifelse(between(Date, start_date, end_date), Value, 0)) %>% 
          summarise(conditional_sum=sum(mval)) 

# # A tibble: 3 x 3 
# # Groups:   start_date [?] 
#   start_date   end_date conditional_sum 
#       <date>     <date>           <dbl> 
# 1 2017-07-01 2017-07-04               0 
# 2 2017-07-24 2017-07-27               4 
# 3 2017-07-25 2017-07-28              12
like image 165
M-- Avatar answered Mar 15 '26 14:03

M--


You could achieve that as follows:

# number of trailing columns without numeric values
c = 2

# create a separate vector with the dates
dates = as.Date(gsub("X","",tail(colnames(df),-c)),format="%m.%d.%Y")

# convert date columns in dataframe
df$start_date = as.Date(df$start_date,format="%m/%d/%Y")
df$end_date = as.Date(df$end_date,format="%m/%d/%Y")

# calculate sum
sapply(1:nrow(df),function(x) {y = df[x,(c+1):ncol(df)][dates %in% 
  seq(df$start_date[x],df$end_date[x],by="day")  ]; ifelse(length(y)>0,sum(y),0) })

returns:

[1]  4  0 12

Hope this helps!

like image 31
Florian Avatar answered Mar 15 '26 14:03

Florian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!