In a dataset with multiple observations for each subject. For each subject I want to select the row which have the maximum value of 'pt'. For example, with a following dataset:
ID <- c(1,1,1,2,2,2,2,3,3) Value <- c(2,3,5,2,5,8,17,3,5) Event <- c(1,1,2,1,2,1,2,2,2) group <- data.frame(Subject=ID, pt=Value, Event=Event) # Subject pt Event # 1 1 2 1 # 2 1 3 1 # 3 1 5 2 # max 'pt' for Subject 1 # 4 2 2 1 # 5 2 5 2 # 6 2 8 1 # 7 2 17 2 # max 'pt' for Subject 2 # 8 3 3 2 # 9 3 5 2 # max 'pt' for Subject 3
Subject 1, 2, and 3 have the biggest pt value of 5, 17, and 5 respectively.
How could I first find the biggest pt value for each subject, and then, put this observation in another data frame? The resulting data frame should only have the biggest pt values for each subject.
We used the MAX() function within a subquery to find the maximum value, and returned the whole row with the outer query.
max() in R The max() is a built-in R function that finds the maximum value of the vector or data frame. It takes the R object as an input and returns the maximum value out of it. To find the maximum value of vector elements, data frame, and columns, use the max() function.
In SQL Server there are several ways to get the MIN or MAX of multiple columns including methods using UNPIVOT, UNION, CASE, etc… However, the simplest method is by using FROM … VALUES i.e. table value constructor. Let's see an example. In this example, there is a table for items with five columns for prices.
Here's a data.table
solution:
require(data.table) ## 1.9.2 group <- as.data.table(group)
If you want to keep all the entries corresponding to max values of pt
within each group:
group[group[, .I[pt == max(pt)], by=Subject]$V1] # Subject pt Event # 1: 1 5 2 # 2: 2 17 2 # 3: 3 5 2
If you'd like just the first max value of pt
:
group[group[, .I[which.max(pt)], by=Subject]$V1] # Subject pt Event # 1: 1 5 2 # 2: 2 17 2 # 3: 3 5 2
In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.
The most intuitive method is to use group_by
and top_n
function in dplyr
group %>% group_by(Subject) %>% top_n(1, pt)
The result you get is
Source: local data frame [3 x 3] Groups: Subject [3] Subject pt Event (dbl) (dbl) (dbl) 1 1 5 2 2 2 17 2 3 3 5 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With