I have a dataframe
with a bunch of donations data. I take the data and arrange it in time order from oldest to most recent gifts. Next I add a column containing a cumulative sum of the gifts over time. The data has multiple years of data and I was looking for a good way to reset the cumsum
to 0 at the start of each year (the year starts and ends July 1st for fiscal purposes).
This is how it currently is:
id date giftamt cumsum()
005 01-05-2001 20.00 20.00
007 06-05-2001 25.00 45.00
009 12-05-2001 20.00 65.00
012 02-05-2002 30.00 95.00
015 08-05-2002 50.00 145.00
025 12-05-2002 25.00 170.00
... ... ... ...
this is how I would like it to look:
id date giftamt cumsum()
005 01-05-2001 20.00 20.00
007 06-05-2001 25.00 45.00
009 12-05-2001 20.00 20.00
012 02-05-2002 30.00 50.00
015 08-05-2002 50.00 50.00
025 12-05-2002 25.00 75.00
... ... ... ...
Any suggestions?
UPDATE:
Here's the code that finally worked courtesy of Seb :
#tweak for changing the calendar year to fiscal year
df$year <- as.numeric(format(as.Date(df$giftdate), format="%Y"))
df$month <- as.numeric(format(as.Date(df$giftdate), format="%m"))
df$year <- ifelse(df$month<=6, df$year, df$year+1)
#cum-summing :)
library(plyr)
finalDf <- ddply(df, .(year), summarize, cumsum(as.numeric(as.character(giftamt))))
i would try it this way (df being the dataframe):
#tweak for changing the calendar year to fiscal year
df$year <- format(as.Date(df$date), format="%Y")
df$month <- format(as.Date(df$date), format="%m")
df$year <- ifelse(df$month<=6, year, year+1)
#cum-summing :)
library(plyr)
ddply(df, .(year), summarize, cumsum(giftamt))
There are two tasks: create a column in the data frame representing each year, then split the data, apply the cumsum, and recombine. R has lots ways of doing both parts.
Probably the most readable way of dong the first task is with year
from the lubridate
package.
library(lubridate)
df$year <- year(df$date)
Note that R has lots of date formats, so it's worth checking to see whether you are currently using POSIXct
or Date
or chron
or zoo
or xts
or one of the other formats.
Seb's choice or ddply
for the second task is the one I'd recommend. For completeness, you can also use tapply
or aggregate
.
with(df, tapply(giftamt, year, cumsum))
aggregate(giftamt ~ year, df, cumsum)
With the new info that you want years to change on 1st July, update the year column to
df$year <- with(df, year(date) + (month(date) >= 7))
gifts <- read.table("gifts.txt", header=T, quote="\"")
NbGifts <- nrow(gifts)
# Determination of the relevant fiscal year ending dates
CalYear <- as.numeric(substr(gifts$date,7,10)) # calendar years
TCY <- as.numeric(names(table(CalYear))) # list of calendar years
MDFY <- "07-01-" # ending date for the current fiscal year
EFY <- paste(MDFY,TCY,sep="") # list of fiscal year ending dates
EFYplus <- cbind(TCY,EFY) # table of fiscal year ending dates
colnames(EFYplus) <- c("CalYear","EndDate")
# Manipulation of data frames in order to match
# the fiscal year end dates to the relevant dates
giftsPlusYear <- data.frame(CalYear, gifts, stringsAsFactors = FALSE)
giftsPlusEFY <- merge(giftsPlusYear,EFYplus) # using the CalYear
# Date comparison in order to associate a gift to its fiscal year
DateGift <- as.Date(giftsPlusEFY$date,"%m-%d-%y") # date conversion for comparison
DateEFY <- as.Date(giftsPlusEFY$EndDate,"%m-%d-%y")
FiscYear <- ifelse(DateGift<DateEFY,giftsPlusEFY$CalYear,giftsPlusEFY$CalYear+1)
# Computation of cumulative totals per fiscal year
LastFY <- 0
CumGift <- rep(0,NbGifts)
for (g in 1:NbGifts){
if (LastFY==FiscYear[g]){
CumGift[g] <- CumGift[g-1] + gifts$giftamt[g]
} else {
CumGift[g] <- gifts$giftamt[g]
LastFY <- FiscYear[g]
}
}
(CumGifts <- cbind(gifts,CumGift))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With