Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Subset data.frame from max value of one vector and grouped by another [duplicate]

Tags:

r

subset

>ID<-c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C')
>WK<-c(1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5)
>NumSuccess<-c(0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 3)
>Data<-data.frame(ID, WK, NumSuccess)

I am trying to create a subset data.frame "Data2" based on the value in "NumSuccesses" that corresponds to the Max Value in "WK" grouped by "ID". Resulting data.frame should look like this:

>ID<-c('A','B','C')
>WK<-c(3, 3, 5)
>NumSuccess<-c(2, 1, 3)
>Data2<-data.frame(ID, WK, NumSuccess)
like image 538
user3594490 Avatar asked Apr 12 '15 05:04

user3594490


1 Answers

This could be done in more than one way. If there are ties for 'WK', maximum value per each 'ID' and want all the rows with the maximum 'WK', it may be useful to filter with the logical condition (WK==max(WK)) after grouping by the 'ID'.

library(dplyr)
Data %>% 
      group_by(ID) %>% 
      filter(WK==max(WK))
#   ID WK NumSuccess
#1  A  3          2
#2  B  3          1
#3  C  5          3

If there is a single 'max' value for 'WK' per 'ID', we can use which.max or use arrange to order the dataset by 'WK' for each 'ID'

  Data %>% 
       group_by(ID) %>%
       slice(which.max(WK))
  #     ID WK NumSuccess
  #1  A  3          2
  #2  B  3          1
  #3  C  5          3

 Data %>%
       group_by(ID) %>%
       arrange(-WK) %>% 
       slice(1)

Similar approaches in data.table are

library(data.table)
setDT(Data)[, .SD[max(WK)==WK], ID]
setDT(Data)[, .SD[which.max(WK)], ID]
setkey(setDT(Data), WK)[, .SD[.N], ID]

Or we can use ave from base R

 Data[with(Data, ave(WK, ID, FUN=max)==WK),]
like image 172
akrun Avatar answered Nov 09 '22 07:11

akrun