I would like to select a row with maximum value in each group with dplyr.
Firstly I generate some random data to show my question
set.seed(1)
df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5))
df$value <- runif(nrow(df))
In plyr, I could use a custom function to select this row.
library(plyr)
ddply(df, .(A, B), function(x) x[which.max(x$value),])
In dplyr, I am using this code to get the maximum value, but not the rows with maximum value (Column C in this case).
library(dplyr)
df %>% group_by(A, B) %>%
summarise(max = max(value))
How could I achieve this? Thanks for any suggestion.
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.2 plyr_1.8.1
loaded via a namespace (and not attached):
[1] assertthat_0.1.0.99 parallel_3.1.0 Rcpp_0.11.1
[4] tools_3.1.0
Maximum value of a column in R can be calculated by using max() function. Max() Function takes column name as argument and calculates the maximum value of that column. Maximum of single column in R, Maximum of multiple columns in R using dplyr.
max() in R The max() is a built-in R function that finds the maximum value of the vector or data frame. It takes the R object as an input and returns the maximum value out of it. To find the maximum value of vector elements, data frame, and columns, use the max() function.
To get the top values in an R data frame, we can use the head function and if we want the values in decreasing order then sort function will be required. Therefore, we need to use the combination of head and sort function to find the top values in decreasing order.
Try this:
result <- df %>%
group_by(A, B) %>%
filter(value == max(value)) %>%
arrange(A,B,C)
Seems to work:
identical(
as.data.frame(result),
ddply(df, .(A, B), function(x) x[which.max(x$value),])
)
#[1] TRUE
As pointed out in the comments, slice
may be preferred here as per @RoyalITS' answer below if you strictly only want 1 row per group. This answer will return multiple rows if there are multiple with an identical maximum value.
df %>% group_by(A,B) %>% slice(which.max(value))
You can use top_n
df %>% group_by(A, B) %>% top_n(n=1)
This will rank by the last column (value
) and return the top n=1
rows.
Currently, you can't change the this default without causing an error (See https://github.com/hadley/dplyr/issues/426)
This more verbose solution provides greater control on what happens in case of duplicate maximum value (in this example, it will take one of the corresponding rows randomly)
library(dplyr)
df %>% group_by(A, B) %>%
mutate(the_rank = rank(-value, ties.method = "random")) %>%
filter(the_rank == 1) %>% select(-the_rank)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With