Let's say I have an R data frame with 3 columns A, B and C , Where A values are not all distinct.
How do I do I get for all values of A, the value of C for which B is minimum (for that value of A) ?
Something like in pseudo SQL code : SELECT C WHERE B = MIN(B) GROUPBY A
?
I have looked at the aggregate()
function but I am not sure it can get it done.
aggregate(B ~ A, data = mydataframe, min)
only gives me the min of B for each A, but then I do not know how to get the corresponding C value.
Is there a way to subset the data frame with the result of this aggregation in order to get the C values, and / or can it be done in only one call of aggregate()
?
Thanks
An example of what I would like to get:
input:
A B C
1 0 1
1 2 2
1 1 3
1 1 4
2 1 1
2 2 2
2 0 3
2 3 4
output:
1
3
1 is the valueof C corresponding to the minimum of B (0) for A = 1
3 is the value of C corresponding to the minimum of B (0) for A = 2
To find the sum of a column values up to a particular value in another column, we can use cumsum function with sum function.
Any of the aggregate functions can be used on one or more than one of the columns being retrieved.
We can use the aggregate() function in R to produce summary statistics for one or more variables in a data frame. where: sum_var: The variable to summarize.
You can use the data.table
package:
library(data.table)
DT <- as.data.table(mydataframe)
DT[ , C[which.min(B)], by = "A"]
# A V1
# 1: 1 1
# 2: 2 3
Or dplyr
:
library(dplyr)
mydataframe %.%
group_by(A) %.%
summarise(res = C[which.min(B)])
# A res
# 1 2 3
# 2 1 1
Or the base function by
:
by(mydataframe, mydataframe$A, function(x) x$C[which.min(x$B)])
# mydataframe$A: 1
# [1] 1
# -------------------------------------------------------------------------------
# mydataframe$A: 2
# [1] 3
1) SQLite guarantees that when you use min
or max
the other column variables will come from the same row so we get a particularly simple solution:
library(sqldf)
# one minimum per group
sqldf("select A, min(B) B, C from DF group by A")
If there can be duplicated minima and we want all of them then this select using a correlated subquery works:
# all minima per group
sqldf("select * from DF x
where x.b = (select min(y.b) from DF y where y.a = x.a)")
2) Using ave
in the base of R we can do this:
# one minimum per group
subset(DF, !! ave(B, A, FUN = function(x) seq_along(x) == which.min(x)))
# all minima per group
subset(DF, !! ave(B, A, FUN = function(x) x == min(x)))
3) If you do want to use aggregate
then do it like this:
# one minimum per group
sq <- 1:nrow(DF)
DF[aggregate(sq ~ A, DF, function(ix) ix[which.min(DF$B[ix])])$sq, ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With