Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting a data frame with top-n rows for each group, and ordered by a variable

I would like to subset a data frame for n rows, which are grouped by a variable and are sorted descending by another variable. This would be clear with an example:

    d1 <- data.frame(Gender = c("M", "M", "F", "F", "M", "M", "F", 
  "F"), Age = c(15, 38, 17, 35, 26, 24, 20, 26))

I would like to get 2 rows, which are sorted descending on Age, for each Gender. The desired output is:

Gender  Age  
F   35  
F   26  
M   38  
M   26  

I looked for order, sort and other solutions here, but could not find an appropriate solution to this problem. I appreciate your help.

like image 472
karlos Avatar asked May 20 '11 17:05

karlos


2 Answers

One solution using ddply() from plyr

require(plyr)
ddply(d1, "Gender", function(x) head(x[order(x$Age, decreasing = TRUE) , ], 2))
like image 57
Chase Avatar answered Oct 18 '22 03:10

Chase


With data.table package

require(data.table)
dt1<-data.table(d1)# to speedup you can add setkey(dt1,Gender)
dt1[,.SD[order(Age,decreasing=TRUE)[1:2]],by=Gender]
like image 44
Wojciech Sobala Avatar answered Oct 18 '22 03:10

Wojciech Sobala