Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R resetting a cumsum to zero at the start of each year

Tags:

r

plyr

I have a dataframe with a bunch of donations data. I take the data and arrange it in time order from oldest to most recent gifts. Next I add a column containing a cumulative sum of the gifts over time. The data has multiple years of data and I was looking for a good way to reset the cumsum to 0 at the start of each year (the year starts and ends July 1st for fiscal purposes).

This is how it currently is:

id        date          giftamt      cumsum()
005       01-05-2001     20.00        20.00
007       06-05-2001     25.00        45.00
009       12-05-2001     20.00        65.00
012       02-05-2002     30.00        95.00
015       08-05-2002     50.00       145.00
025       12-05-2002     25.00       170.00
...          ...          ...         ...

this is how I would like it to look:

id        date          giftamt      cumsum()
005       01-05-2001     20.00        20.00
007       06-05-2001     25.00        45.00
009       12-05-2001     20.00        20.00
012       02-05-2002     30.00        50.00
015       08-05-2002     50.00        50.00
025       12-05-2002     25.00        75.00
...          ...          ...          ...

Any suggestions?

UPDATE:

Here's the code that finally worked courtesy of Seb :

#tweak for changing the calendar year to fiscal year
df$year <- as.numeric(format(as.Date(df$giftdate), format="%Y"))
df$month <- as.numeric(format(as.Date(df$giftdate), format="%m"))
df$year <- ifelse(df$month<=6, df$year, df$year+1)

#cum-summing :)
library(plyr)
finalDf <- ddply(df, .(year), summarize, cumsum(as.numeric(as.character(giftamt))))
like image 936
screechOwl Avatar asked Dec 16 '11 15:12

screechOwl


3 Answers

i would try it this way (df being the dataframe):

#tweak for changing the calendar year to fiscal year
df$year <- format(as.Date(df$date), format="%Y")
df$month <- format(as.Date(df$date), format="%m")
df$year <- ifelse(df$month<=6, year, year+1)

#cum-summing :)
library(plyr)
ddply(df, .(year), summarize, cumsum(giftamt))
like image 84
Seb Avatar answered Nov 09 '22 01:11

Seb


There are two tasks: create a column in the data frame representing each year, then split the data, apply the cumsum, and recombine. R has lots ways of doing both parts.

Probably the most readable way of dong the first task is with year from the lubridate package.

library(lubridate)
df$year <- year(df$date)

Note that R has lots of date formats, so it's worth checking to see whether you are currently using POSIXct or Date or chron or zoo or xts or one of the other formats.

Seb's choice or ddply for the second task is the one I'd recommend. For completeness, you can also use tapply or aggregate.

with(df, tapply(giftamt, year, cumsum))
aggregate(giftamt ~ year, df, cumsum)

With the new info that you want years to change on 1st July, update the year column to

df$year <- with(df, year(date) + (month(date) >= 7))
like image 3
Richie Cotton Avatar answered Nov 09 '22 02:11

Richie Cotton


gifts <- read.table("gifts.txt", header=T, quote="\"")
NbGifts <- nrow(gifts)

# Determination of the relevant fiscal year ending dates
CalYear <- as.numeric(substr(gifts$date,7,10)) # calendar years
TCY <- as.numeric(names(table(CalYear))) # list of calendar years
MDFY <- "07-01-" # ending date for the current fiscal year
EFY <- paste(MDFY,TCY,sep="") # list of fiscal year ending dates
EFYplus <- cbind(TCY,EFY) # table of fiscal year ending dates
colnames(EFYplus) <- c("CalYear","EndDate")

# Manipulation of data frames in order to match
# the fiscal year end dates to the relevant dates
giftsPlusYear <- data.frame(CalYear, gifts, stringsAsFactors = FALSE)
giftsPlusEFY <- merge(giftsPlusYear,EFYplus) # using the CalYear

# Date comparison in order to associate a gift to its fiscal year
DateGift <- as.Date(giftsPlusEFY$date,"%m-%d-%y") # date conversion for comparison
DateEFY <- as.Date(giftsPlusEFY$EndDate,"%m-%d-%y")
FiscYear <- ifelse(DateGift<DateEFY,giftsPlusEFY$CalYear,giftsPlusEFY$CalYear+1)

# Computation of cumulative totals per fiscal year
LastFY <- 0
CumGift <- rep(0,NbGifts)
for (g in 1:NbGifts){
  if (LastFY==FiscYear[g]){
    CumGift[g] <- CumGift[g-1] + gifts$giftamt[g]
    } else {
      CumGift[g] <- gifts$giftamt[g]
      LastFY <- FiscYear[g]
    }
}
(CumGifts <- cbind(gifts,CumGift))
like image 1
Jean-Victor Côté Avatar answered Nov 09 '22 01:11

Jean-Victor Côté