Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract the maximum value within each group in a dataframe [duplicate]

Tags:

r

aggregate

I have a data frame with a grouping variable ("Gene") and a value variable ("Value"):

Gene   Value A      12 A      10 B      3 B      5 B      6 C      1 D      3 D      4 

For each level of my grouping variable, I wish to extract the maximum value. The result should thus be a data frame with one row per level of the grouping variable:

Gene   Value A      12 B      6 C      1 D      4 

Could aggregate do the trick?

like image 435
Johnathan Avatar asked Aug 14 '14 17:08

Johnathan


People also ask

How do you get the maximum values of each group in a Pandas?

To get the maximum value of each group, you can directly apply the pandas max() function to the selected column(s) from the result of pandas groupby.

How do you find the maximum value of a group in R?

In R, we can find the group wise maximum value by using group_by and slice functions in dplyr package.

What does Groupby Max do?

max. Compute max of group values. Include only float, int, boolean columns.

How do I use the mutate function in R?

In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.


1 Answers

There are many possibilities to do this in R. Here are some of them:

df <- read.table(header = TRUE, text = 'Gene   Value A      12 A      10 B      3 B      5 B      6 C      1 D      3 D      4')  # aggregate aggregate(df$Value, by = list(df$Gene), max) aggregate(Value ~ Gene, data = df, max)  # tapply tapply(df$Value, df$Gene, max)  # split + lapply lapply(split(df, df$Gene), function(y) max(y$Value))  # plyr require(plyr) ddply(df, .(Gene), summarise, Value = max(Value))  # dplyr require(dplyr) df %>% group_by(Gene) %>% summarise(Value = max(Value))  # data.table require(data.table) dt <- data.table(df) dt[ , max(Value), by = Gene]  # doBy require(doBy) summaryBy(Value~Gene, data = df, FUN = max)  # sqldf require(sqldf) sqldf("select Gene, max(Value) as Value from df group by Gene", drv = 'SQLite')  # ave df[as.logical(ave(df$Value, df$Gene, FUN = function(x) x == max(x))),] 
like image 141
EDi Avatar answered Oct 01 '22 09:10

EDi