I have a very large data set, and it looks like this one below:
df <- data.frame(school=c("a", "a", "a", "b","b","c","c","c"), year=c(3,3,1,4,2,4,3,1), GPA=c(4,4,4,3,3,3,2,2))
school year GPA
a 3 4
a 3 4
a 1 4
b 4 3
b 2 3
c 4 3
c 3 2
c 1 2
and I want it to be look like:
school year GPA
a 3 4
a 3 4
b 4 3
c 4 3
So basically, what I want is for each given school, I want their top year student(students), regardless of the GPA.
I have tried:
new_df <- df[!duplicated(paste(df[,1],df[,2])),]
but this gives me the unique combination between the school and year.
while the one below gives me the unique school
new_df2 <- df[!duplicated(df$school),]
Using the plyr
library
require(plyr)
ddply(df,.(school),function(x){x[x$year==max(x$year),]})
> ddply(df,.(school),function(x){x[x$year==max(x$year),]})
school year GPA
1 a 3 4
2 a 3 4
3 b 4 3
4 c 4 3
or base
test<-lapply(split(df,df$school),function(x){x[x$year==max(x$year),]})
out<-do.call(rbind,test)
> out
school year GPA
a.1 a 3 4
a.2 a 3 4
b b 4 3
c c 4 3
Explanation:
split
splits the dataframe into a list by schools.
dat<-split(df,df$school)
> dat
$a
school year GPA
1 a 3 4
2 a 3 4
3 a 1 4
$b
school year GPA
4 b 4 3
5 b 2 3
$c
school year GPA
6 c 4 3
7 c 3 2
8 c 1 2
for each school we want the members in the top year.
dum.fun<-function(x){x[x$year==max(x$year),]}
> dum.fun(dat$a)
school year GPA
1 a 3 4
2 a 3 4
lapply
applies a function over the members of a list and outputs a list
> lapply(split(df,df$school),function(x){x[x$year==max(x$year),]})
$a
school year GPA
1 a 3 4
2 a 3 4
$b
school year GPA
4 b 4 3
$c
school year GPA
6 c 4 3
this is what we want but in list form. We need to bind the members of the list together. We do this by calling rbind
on the members successively using do.call
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With