R - aggregate data for 1 column by another column, based on statistices on a 3rd column

Tags:

r

aggregate

Let's say I have an R data frame with 3 columns A, B and C , Where A values are not all distinct.

How do I do I get for all values of A, the value of C for which B is minimum (for that value of A) ? Something like in pseudo SQL code : SELECT C WHERE B = MIN(B) GROUPBY A ?

I have looked at the aggregate() function but I am not sure it can get it done.

aggregate(B ~ A, data = mydataframe, min) only gives me the min of B for each A, but then I do not know how to get the corresponding C value.

Is there a way to subset the data frame with the result of this aggregation in order to get the C values, and / or can it be done in only one call of aggregate() ?

Thanks

An example of what I would like to get:

input:

output:

1
3

1 is the valueof C corresponding to the minimum of B (0) for A = 1

3 is the value of C corresponding to the minimum of B (0) for A = 2

553

asked Feb 19 '14 12:02

Jeanpierre Nenuphar

2 Answers

You can use the data.table package:

library(data.table)
DT <- as.data.table(mydataframe)

DT[ , C[which.min(B)], by = "A"]
#    A V1
# 1: 1  1
# 2: 2  3

Or dplyr:

library(dplyr)
mydataframe %.%
  group_by(A) %.%
  summarise(res = C[which.min(B)])
#   A res
# 1 2   3
# 2 1   1

Or the base function by:

by(mydataframe, mydataframe$A, function(x) x$C[which.min(x$B)])
# mydataframe$A: 1
# [1] 1
# -------------------------------------------------------------------------------
# mydataframe$A: 2
# [1] 3

156

answered Oct 20 '22 14:10

Sven Hohenstein

1) SQLite guarantees that when you use min or max the other column variables will come from the same row so we get a particularly simple solution:

library(sqldf)

# one minimum per group
sqldf("select A, min(B) B, C from DF group by A")

If there can be duplicated minima and we want all of them then this select using a correlated subquery works:

# all minima per group
sqldf("select * from DF x 
      where x.b = (select min(y.b) from DF y where y.a = x.a)")

2) Using ave in the base of R we can do this:

# one minimum per group
subset(DF, !! ave(B, A, FUN = function(x) seq_along(x) == which.min(x)))

# all minima per group
subset(DF, !! ave(B, A, FUN = function(x) x == min(x)))

3) If you do want to use aggregate then do it like this:

# one minimum per group
sq <- 1:nrow(DF)
DF[aggregate(sq ~ A, DF, function(ix) ix[which.min(DF$B[ix])])$sq, ]

answered Oct 20 '22 15:10

G. Grothendieck

Related questions
                            
                                rpart: Computational time for categorical vs continuous regressors
                            
                                Set code background colour in R markdown to PDF
                            
                                Measuring VAR accuracy using accuracy() from forecast
                            
                                Capitalizing text of a specific column in R's data frame
                            
                                R plot title encoding in Pdf
                            
                                Overdraw mean points in grouped boxplot with ggplot2
                            
                                Vastly different results for SVM model using e1071 and caret
                            
                                Grabbing object names from within a function
                            
                                How to plot NA bar with ggplot2
                            
                                Efficiently selecting top number of rows for each unique value of a column in a data.frame
                            
                                Sort by absolute value
                            
                                Using parallelisation to scrape web pages with R
                            
                                qqline in ggplot2 with facets
                            
                                export all the content of r script into pdf
                            
                                testthat: handling both warning and value
                            
                                Ignore safety check when using setnames
                            
                                Generating a consistent dynamic color palette in ggplot within a loop?
                            
                                How to plot overlapping ranges with ggplot2
                            
                                How to remove selected R variables without having to type their names
                            
                                Dictionary() is not supported anymore in tm package. How to emend code?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With