Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function to complex subset in data.table

Tags:

r

data.table

plyr

I'm new to data.table and I'd like to get better as I enter the realm of truly big datasets.

I'm trying calculate yearly means for a variable x, but from Jun (year y -1) to Jun (year y). This is easy using plyr:

set.seed(9)

dat = data.frame(
  year = rep(2000:2010, each = 12),
  month = 1:12,
  x = runif(12*length(2000:2010))
)

library(plyr)

ldply(unique(dat$year), function(i) 

  if(i == unique(dat$year)[1]) NULL else # in the event going too far back

    data.frame(
      year = i,
      mean.x = mean(c(dat[dat$year == (i - 1) & dat$month == 7:12, "x"], dat[dat$year == i & dat$month == 1:6, "x"]))
    )

)

But I'm struggling to convert the syntax into data.table. I'd prefer to do it without creating an intermediate variable for year shifting everything 6 months forward, as there are some variables I would like to summarize in their original Jan-Dec framing.

Any help is appreciated! Cheers

like image 871
jslefche Avatar asked Apr 25 '26 21:04

jslefche


2 Answers

Using data.table, we shift the 'year' by 6, use that as grouping variable, and get the mean of 'x'

setDT(dat)[, .(Mean = mean(x)) ,.(year = shift(year, 6)+1)][-c(1L,.N)]
#  year      Mean
# 1: 2001 0.5086499
# 2: 2002 0.5197482
# 3: 2003 0.6547623
# 4: 2004 0.5869022
# 5: 2005 0.4502414
# 6: 2006 0.5000369
# 7: 2007 0.4514377
# 8: 2008 0.4566757
# 9: 2009 0.3844152
#10: 2010 0.5635942
like image 121
akrun Avatar answered Apr 27 '26 16:04

akrun


Dplyr provides a straightforward solution. Essentially, create a dummy variable of relative year - the year you are going to group on - then group, summarize, and make your year column have the right name again.

library(dplyr)


summaryDat <- dat %>%

  #assign relative year for calculation
  mutate(relYear = ifelse(month>6, year+1, year)) %>%

  #now group on relative year
  group_by(relYear) %>%

  #get your mean
  summarize(mean_x = mean(x)) %>%

  #now ungroup
  ungroup() %>%

  #format year nicely
  rename(year = relYear)
like image 43
jebyrnes Avatar answered Apr 27 '26 14:04

jebyrnes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!