Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select the row with the maximum value in each group

Tags:

dataframe

r

r-faq

In a dataset with multiple observations for each subject. For each subject I want to select the row which have the maximum value of 'pt'. For example, with a following dataset:

ID    <- c(1,1,1,2,2,2,2,3,3) Value <- c(2,3,5,2,5,8,17,3,5) Event <- c(1,1,2,1,2,1,2,2,2)  group <- data.frame(Subject=ID, pt=Value, Event=Event) #   Subject pt Event # 1       1  2     1 # 2       1  3     1 # 3       1  5     2 # max 'pt' for Subject 1 # 4       2  2     1 # 5       2  5     2 # 6       2  8     1 # 7       2 17     2 # max 'pt' for Subject 2 # 8       3  3     2 # 9       3  5     2 # max 'pt' for Subject 3 

Subject 1, 2, and 3 have the biggest pt value of 5, 17, and 5 respectively.

How could I first find the biggest pt value for each subject, and then, put this observation in another data frame? The resulting data frame should only have the biggest pt values for each subject.

like image 758
Xinting WANG Avatar asked Jul 03 '14 15:07

Xinting WANG


People also ask

How do I SELECT the maximum row value in SQL?

We used the MAX() function within a subquery to find the maximum value, and returned the whole row with the outer query.

How do you find the maximum value in a row in R?

max() in R The max() is a built-in R function that finds the maximum value of the vector or data frame. It takes the R object as an input and returns the maximum value out of it. To find the maximum value of vector elements, data frame, and columns, use the max() function.

How do you SELECT the maximum value from multiple columns in SQL?

In SQL Server there are several ways to get the MIN or MAX of multiple columns including methods using UNPIVOT, UNION, CASE, etc… However, the simplest method is by using FROM … VALUES i.e. table value constructor. Let's see an example. In this example, there is a table for items with five columns for prices.


2 Answers

Here's a data.table solution:

require(data.table) ## 1.9.2 group <- as.data.table(group) 

If you want to keep all the entries corresponding to max values of pt within each group:

group[group[, .I[pt == max(pt)], by=Subject]$V1] #    Subject pt Event # 1:       1  5     2 # 2:       2 17     2 # 3:       3  5     2 

If you'd like just the first max value of pt:

group[group[, .I[which.max(pt)], by=Subject]$V1] #    Subject pt Event # 1:       1  5     2 # 2:       2 17     2 # 3:       3  5     2 

In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.

like image 112
Arun Avatar answered Oct 16 '22 03:10

Arun


The most intuitive method is to use group_by and top_n function in dplyr

    group %>% group_by(Subject) %>% top_n(1, pt) 

The result you get is

    Source: local data frame [3 x 3]     Groups: Subject [3]        Subject    pt Event         (dbl) (dbl) (dbl)     1       1     5     2     2       2    17     2     3       3     5     2 
like image 38
Xi Liang Avatar answered Oct 16 '22 02:10

Xi Liang