I'm trying to calculate the minimum values of a numeric column for each level of a factor, while keeping values of another factor in the resulting data frame.
# dummy data
dat <- data.frame(
code = c("HH11", "HH45", "JL03", "JL03", "JL03", "HH11"),
index = c("023434", "3377477", "3388595", "3377477", "1177777", "023434"),
value = c(24.1, 37.2, 78.9, 45.9, 20.0, 34.6)
)
The result I want is the minimum of value
for each level of code
, keeping index
in the resulting data frame.
# result I want:
# code value index
# 1 HH11 24.1 023434
# 2 HH45 37.2 3377477
# 3 JL03 20.0 1177777
# ddply attempt
library(plyr)
ddply(dat, ~ code, summarise, val = min(value))
# code val
# 1 HH11 24.1
# 2 HH45 37.2
# 3 JL03 20.0
# base R attempt
aggregate(value ~ code, dat, min)
# code value
# 1 HH11 24.1
# 2 HH45 37.2
# 3 JL03 20.0
aggregate() function is used to get the summary statistics of the data by group. The statistics include mean, min, sum.
In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .
Aggregate is a function in base R which can, as the name suggests, aggregate the inputted data. frame d.f by applying a function specified by the FUN parameter to each column of sub-data. frames defined by the by input parameter. The by parameter has to be a list .
You need to use merge
on result of aggregate
and original data.frame
merge(aggregate(value ~ code, dat, min), dat, by = c("code", "value"))
## code value index
## 1 HH11 24.1 023434
## 2 HH45 37.2 3377477
## 3 JL03 20.0 1177777
Just to show that there's always multiple ways to skin a cat:
Using ave
to get the indexes of the minimum rows in each group:
dat[which(ave(dat$value,dat$code,FUN=function(x) x==min(x))==1),]
# code index value
#1 HH11 023434 24.1
#2 HH45 3377477 37.2
#5 JL03 1177777 20.0
This method also has the potential benefit of returning multiple rows per code
group in the instance of multiple values being the minimum.
And another method using by
:
do.call(rbind,
by(dat, dat$code, function(x) cbind(x[1,c("code","index")],value=min(x$value)))
)
# code index value
# HH11 HH11 023434 24.1
# HH45 HH45 3377477 37.2
# JL03 JL03 3388595 20.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With