R data.table count panel data

Question

I have panel data (subject/year) for which I would like to only keep subjects who appear the maximum number of times per year. The data set is large so I am using the data.table package. Is there a more elegant solution than what I have tried below?

library(data.table)

DT <- data.table(SUBJECT=c(rep('John',3), rep('Paul',2), 
                           rep('George',3), rep('Ringo',2), 
                           rep('John',2), rep('Paul',4), 
                           rep('George',2), rep('Ringo',4)), 
                 YEAR=c(rep(2011,10), rep(2012,12)), 
                 HEIGHT=rnorm(22), 
                 WEIGHT=rnorm(22))
DT

DT[, COUNT := .N, by='SUBJECT,YEAR']
DT[, MAXCOUNT := max(COUNT), by='YEAR']

DT <- DT[COUNT==MAXCOUNT]
DT <- DT[, c('COUNT','MAXCOUNT') := NULL]
DT

Matt Dowle · Accepted Answer

I'm not sure you'll view this as elegant but how about :

DT[, COUNT := .N, by='SUBJECT,YEAR']
DT[, .SD[COUNT == max(COUNT)], by='YEAR']

That's essentially how to apply by to the i expression as @SenorO commented. You'll still need [,COUNT:=NULL] afterwards but for one temporary column rather than two.

We do discourage .SD though for speed reasons, but hopefully we'll get to this feature request soon so that advice can be dropped: FR#2330 Optimize .SD[i] query to keep the elegance but make it faster unchanged..

A different approach is as follows. It's faster and idiomatic but may be considered less elegant.

# Create a small aggregate table first. No need to use := on the big table.
i = DT[, .N, by='SUBJECT,YEAR']

# Find the even smaller subset. (Do as much as we can on the small aggregate.)
i = i[, .SD[N==max(N)], by=YEAR]

# Finally join the small subset of key values to the big table
setkey(DT, YEAR, SUBJECT)
DT[i]

Something similar is here.

R data.table count panel data

Tags:

r

count

data.table

user1491868

1 Answers

Matt Dowle

Recent Activity

Donate For Us

R data.table count panel data

Tags:

r

count

data.table

user1491868

1 Answers

Matt Dowle

Related questions

Recent Activity

Donate For Us