Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Moving Averages on multiple columns - Grouped Data

Tags:

r

data.table

Apologies if this has been answered. I've gone through numerous examples today but I can't find any that match what I am trying to do.

I have a data set which I need to calculate a 3 point moving average on. I've generated some dummy data below:

set.seed(1234)
data.frame(Week = rep(seq(1:5), 3), 
 Section = c(rep("a", 5), rep("b", 5), rep("c", 5)), 
 Qty = runif(15, min = 100, max = 500), 
 To =  runif(15, min = 40, max = 80))

I want to calculate the MA for each group based on the 'Section' column for both the 'Qty' and the 'To' columns. Ideally the output would be a data table. The moving average would start at Week 3 so would be the average of wks 1:3

I am trying to master the data.table package so a solution using that would be great but otherwise any will be much appreciated.

Just for reference my actual data set will have approx. 70 sections with c.1M rows in total. I've found the data.table to be extremely fast at crunching these kind of volumes so far.

like image 523
MidnightDataGeek Avatar asked Mar 14 '16 15:03

MidnightDataGeek


2 Answers

We could use rollmean from the zoo package, in combination with data.table .

library(data.table)
library(zoo)
setDT(df)[, c("Qty.mean","To.mean") := lapply(.SD, rollmean, k = 3, fill = NA, align = "right"), 
              .SDcols = c("Qty","To"), by = Section]
 > df
 #   Week Section      Qty       To Qty.mean  To.mean
 #1:    1       a 145.4814 73.49183       NA       NA
 #2:    2       a 348.9198 51.44893       NA       NA
 #3:    3       a 343.7099 50.67283 279.3703 58.53786
 #4:    4       a 349.3518 47.46891 347.3271 49.86356
 #5:    5       a 444.3662 49.28904 379.1426 49.14359
 #6:    1       b 356.1242 52.66450       NA       NA
 #7:    2       b 103.7983 52.10773       NA       NA
 #8:    3       b 193.0202 46.36184 217.6476 50.37802
 #9:    4       b 366.4335 41.59984 221.0840 46.68980
#10:    5       b 305.7005 48.75198 288.3847 45.57122
#11:    1       c 377.4365 72.42394       NA       NA
#12:    2       c 317.9899 61.02790       NA       NA
#13:    3       c 213.0934 76.58633 302.8400 70.01272
#14:    4       c 469.3734 73.25380 333.4856 70.28934
#15:    5       c 216.9263 41.83081 299.7977 63.89031
like image 72
mtoto Avatar answered Sep 23 '22 12:09

mtoto


A solution using dplyr:

library(dplyr); library(zoo)
myfun = function(x) rollmean(x, k = 3, fill = NA, align = "right")
df %>% group_by(Section) %>% mutate_each(funs(myfun), Qty, To)
#### Week Section      Qty       To
#### (int)  (fctr)    (dbl)    (dbl)
#### 1      1       a       NA       NA
#### 2      2       a       NA       NA
#### 3      3       a 279.3703 58.53786
#### 4      4       a 347.3271 49.86356
like image 29
agenis Avatar answered Sep 25 '22 12:09

agenis