Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

select rows with largest value of variable within a group in r

a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

r<-sapply(split(a.3,a.2),function(x) which.max(x$b.2))

a.3[r,]

returns the list index, not the index for the entire data.frame

Im trying to return the largest value of b.2 for each subgroup of a.2. How can I do this efficiently?

like image 753
Misha Avatar asked May 12 '10 19:05

Misha


2 Answers

library(plyr)
ddply(a.3, "a.2", subset, b.2 == max(b.2))
like image 50
hadley Avatar answered Oct 22 '22 03:10

hadley


a.2<-sample(1:10,100,replace=T)
b.2<-sample(1:100,100,replace=T)
a.3<-data.frame(a.2,b.2)

The answer by Jonathan Chang gets you what you explicitly asked for, but I'm guessing that you want the actual row from the data frame.

sel <- ave(b.2, a.2, FUN = max) == b.2
a.3[sel,]
like image 44
John Avatar answered Oct 22 '22 03:10

John