I'm trying to calculate the minimum values of a numeric column for each level of a factor, while keeping values of another factor in the resulting data frame. <pre class="prettyprint"><code># dummy data dat <- data.frame( code = c("HH11", "HH45", "JL03", "JL03", "JL03", "HH11"), index = c("023434", "3377477", "3388595", "3377477", "1177777", "023434"), value = c(24.1, 37.2, 78.9, 45.9, 20.0, 34.6) ) </code></pre> The result I want is the minimum of <code>value</code> for each level of <code>code</code>, keeping <code>index</code> in the resulting data frame. <pre class="prettyprint"><code># result I want: # code value index # 1 HH11 24.1 023434 # 2 HH45 37.2 3377477 # 3 JL03 20.0 1177777 # ddply attempt library(plyr) ddply(dat, ~ code, summarise, val = min(value)) # code val # 1 HH11 24.1 # 2 HH45 37.2 # 3 JL03 20.0 # base R attempt aggregate(value ~ code, dat, min) # code value # 1 HH11 24.1 # 2 HH45 37.2 # 3 JL03 20.0 </code></pre>

You need to use <code>merge</code> on result of <code>aggregate</code> and original <code>data.frame</code> <pre class="prettyprint"><code>merge(aggregate(value ~ code, dat, min), dat, by = c("code", "value")) ## code value index ## 1 HH11 24.1 023434 ## 2 HH45 37.2 3377477 ## 3 JL03 20.0 1177777 </code></pre>

Aggregate by factor levels, keeping other variables in the resulting data frame

Tags:

r

I'm trying to calculate the minimum values of a numeric column for each level of a factor, while keeping values of another factor in the resulting data frame.

# dummy data
dat <- data.frame(
    code = c("HH11", "HH45", "JL03", "JL03", "JL03", "HH11"), 
    index = c("023434", "3377477", "3388595", "3377477", "1177777", "023434"), 
    value = c(24.1, 37.2, 78.9, 45.9, 20.0, 34.6)
    )

The result I want is the minimum of value for each level of code, keeping index in the resulting data frame.

# result I want:
#   code value    index
# 1 HH11  24.1   023434
# 2 HH45  37.2  3377477
# 3 JL03  20.0  1177777


# ddply attempt
library(plyr)
ddply(dat, ~ code, summarise, val = min(value))
#   code   val
# 1 HH11  24.1
# 2 HH45  37.2
# 3 JL03  20.0


# base R attempt
aggregate(value ~ code, dat, min)
#   code value
# 1 HH11  24.1
# 2 HH45  37.2
# 3 JL03  20.0

622

asked Apr 26 '13 01:04

Chris

2 Answers

You need to use merge on result of aggregate and original data.frame

merge(aggregate(value ~ code, dat, min), dat, by = c("code", "value"))
##   code value   index
## 1 HH11  24.1  023434
## 2 HH45  37.2 3377477
## 3 JL03  20.0 1177777

157

answered Oct 02 '22 16:10

CHP

Just to show that there's always multiple ways to skin a cat:

Using ave to get the indexes of the minimum rows in each group:

dat[which(ave(dat$value,dat$code,FUN=function(x) x==min(x))==1),]

#  code   index value
#1 HH11  023434  24.1
#2 HH45 3377477  37.2
#5 JL03 1177777  20.0

This method also has the potential benefit of returning multiple rows per code group in the instance of multiple values being the minimum.

And another method using by:

do.call(rbind,
  by(dat, dat$code, function(x) cbind(x[1,c("code","index")],value=min(x$value)))
)
#      code   index value
# HH11 HH11  023434  24.1
# HH45 HH45 3377477  37.2
# JL03 JL03 3388595  20.0

answered Oct 02 '22 17:10

thelatemail

Related questions
                            
                                What is the logic of this function in R?
                            
                                when do you want to set up new environments in R
                            
                                convert a data frame into a specifically formatted frequency table
                            
                                Time series and stl in R: Error only univariate series are allowed
                            
                                R pdf() usage inside a function()
                            
                                How to properly include dependencies in R-package?
                            
                                Elegant R function: mixed case separated by periods to underscore separated lower case and/or camel case
                            
                                add and resize a local image to a .Rmd file in RStudio that will produce a pdf
                            
                                Using source() within parallel foreach loops
                            
                                Conditional panel in Shiny dashboard
                            
                                R: converting each row of a data frame into a list item
                            
                                In R data.table, how do I pass variable parameters to an expression?
                            
                                Large Matrices in R: long vectors not supported yet
                            
                                GBM R function: get variable importance separately for each class
                            
                                Use pipe without feeding first argument
                            
                                How to apply geom_smooth() for every group?
                            
                                No RTools compatible with R version 3.5.0 was found
                            
                                Summarise to return the length by group
                            
                                R crashing while displaying ggplot after update (process memory read out of range)
                            
                                How to escape % in roxygen literate programming?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With