In my working dataset, I'm trying to calculate week-over-week values for the changes in wholesale and revenue. The code seems to work, but my estimates show it'll take about 75hrs to run what is a seemingly simple calculation. Below is the generic reproducible version which takes about 2m to run on this smaller dataset:
########################################################################################################################
# MAKE A GENERIC REPORDUCIBLE STACK OVERFLOW QUESTION
########################################################################################################################
# Create empty data frame of 26,000 observations similar to my data, but populated with noise
exampleData <- data.frame(product = rep(LETTERS,1000),
wholesale = rnorm(1000*26),
revenue = rnorm(1000*26))
# create a week_ending column which increases by one week with every set of 26 "products"
for(i in 1:nrow(exampleData)){
exampleData$week_ending[i] <- as.Date("2016-09-04")+7*floor((i-1)/26)
}
exampleData$week_ending <- as.Date(exampleData$week_ending, origin = "1970-01-01")
# create empty columns to fill
exampleData$wholesale_wow <- NA
exampleData$revenue_wow <- NA
# loop through the wholesale and revenue numbers and append the week-over-week changes
for(i in 1:nrow(exampleData)){
# set a condition where the loop only appends the week-over-week values if it's not the first week
if(exampleData$week_ending[i]!="2016-09-04"){
# set temporary values for the current and past week's wholesale value
currentWholesale <- exampleData$wholesale[i]
lastWeekWholesale <- exampleData$wholesale[which(exampleData$product==exampleData$product[i] &
exampleData$week_ending==exampleData$week_ending[i]-7)]
exampleData$wholesale_wow[i] <- currentWholesale/lastWeekWholesale -1
# set temporary values for the current and past week's revenue
currentRevenue <- exampleData$revenue[i]
lastWeekRevenue <- exampleData$revenue[which(exampleData$product==exampleData$product[i] &
exampleData$week_ending==exampleData$week_ending[i]-7)]
exampleData$revenue_wow[i] <- currentRevenue/lastWeekRevenue -1
}
}
Any help understanding why this takes so long or how to cut down the time would be much appreciated!
To calculate your Week-on-Week you can't just divide this week's results by the previous week's results. You need to subtract 1 after dividing your metric by the previous week. This is so you can see how much performance has changed, as opposed to how it relates to the previous week.
The first for
loop can be simplified with the following for:
exampleData$week_ending2 <- as.Date("2016-09-04") + 7 * floor((seq_len(nrow(exampleData)) - 1) / 26)
setequal(exampleData$week_ending, exampleData$week_ending2)
[1] TRUE
Replacing second for
loop
library(data.table)
dt1 <- as.data.table(exampleData)
dt1[, wholesale_wow := wholesale / shift(wholesale) - 1 , by = product]
dt1[, revenue_wow := revenue / shift(revenue) - 1 , by = product]
setequal(exampleData, dt1)
[1] TRUE
This takes about 4 milliseconds to run on my laptop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With