Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create efficient week over week calculation with subsetting

In my working dataset, I'm trying to calculate week-over-week values for the changes in wholesale and revenue. The code seems to work, but my estimates show it'll take about 75hrs to run what is a seemingly simple calculation. Below is the generic reproducible version which takes about 2m to run on this smaller dataset:

########################################################################################################################
# MAKE A GENERIC REPORDUCIBLE STACK OVERFLOW QUESTION
########################################################################################################################

# Create empty data frame of 26,000 observations  similar to my data, but populated with noise
exampleData <- data.frame(product = rep(LETTERS,1000),
                          wholesale = rnorm(1000*26), 
                          revenue = rnorm(1000*26))

# create a week_ending column which increases by one week with every set of 26 "products"
for(i in 1:nrow(exampleData)){
  exampleData$week_ending[i] <- as.Date("2016-09-04")+7*floor((i-1)/26)
}
exampleData$week_ending <- as.Date(exampleData$week_ending, origin = "1970-01-01")

# create empty columns to fill
exampleData$wholesale_wow <- NA
exampleData$revenue_wow <- NA

# loop through the wholesale and revenue numbers and append the week-over-week changes
for(i in 1:nrow(exampleData)){
  # set a condition where the loop only appends the week-over-week values if it's not the first week
  if(exampleData$week_ending[i]!="2016-09-04"){
    # set temporary values for the current and past week's wholesale value
    currentWholesale <- exampleData$wholesale[i]
    lastWeekWholesale <- exampleData$wholesale[which(exampleData$product==exampleData$product[i] & 
                                                       exampleData$week_ending==exampleData$week_ending[i]-7)] 
    exampleData$wholesale_wow[i] <- currentWholesale/lastWeekWholesale -1

    # set temporary values for the current and past week's revenue
    currentRevenue <- exampleData$revenue[i]
    lastWeekRevenue <- exampleData$revenue[which(exampleData$product==exampleData$product[i] & 
                                                   exampleData$week_ending==exampleData$week_ending[i]-7)] 
    exampleData$revenue_wow[i] <- currentRevenue/lastWeekRevenue -1
  }
}

Any help understanding why this takes so long or how to cut down the time would be much appreciated!

like image 700
Will Wright Avatar asked Sep 27 '17 18:09

Will Wright


People also ask

How do you calculate week over week?

To calculate your Week-on-Week you can't just divide this week's results by the previous week's results. You need to subtract 1 after dividing your metric by the previous week. This is so you can see how much performance has changed, as opposed to how it relates to the previous week.


1 Answers

The first for loop can be simplified with the following for:

exampleData$week_ending2 <- as.Date("2016-09-04") + 7 * floor((seq_len(nrow(exampleData)) - 1) / 26)

setequal(exampleData$week_ending, exampleData$week_ending2)
[1] TRUE

Replacing second for loop

library(data.table)
dt1 <- as.data.table(exampleData)
dt1[, wholesale_wow := wholesale / shift(wholesale) - 1 , by = product]
dt1[, revenue_wow := revenue / shift(revenue) - 1 , by = product]

setequal(exampleData, dt1)
[1] TRUE

This takes about 4 milliseconds to run on my laptop

like image 62
manotheshark Avatar answered Sep 30 '22 13:09

manotheshark