Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate multiple variables using multiple different FUN in R

This is an extension of the questions asked here: Aggregate / summarize multiple variables per group (e.g. sum, mean).

  • Specifically, if I have multiple variables to aggregate, is there a way to change the FUN each variable is aggregated by?

Example:

dat <- data.frame(ID = rep(letters[1:3], each =3), Plot = rep(1:3,3),Val1 = (1:9)*10, Val2 = (1:9)*20)

> dat
  ID Plot Val1 Val2
1  a    1   10   20
2  a    2   20   40
3  a    3   30   60
4  b    1   40   80
5  b    2   50  100
6  b    3   60  120
7  c    1   70  140
8  c    2   80  160
9  c    3   90  180


#Aggregate 2 variables using the *SAME* FUN
  aggregate(cbind(Val1, Val2) ~ ID, dat, sum)

  ID Val1 Val2
1  a   60  120
2  b  150  300
3  c  240  480
  • but notice that both variables are summed.

What if I want to take the sum of Val1 and the mean of Val2??

The best solution I have is:

merge(
  aggregate(Val1 ~ ID, dat, sum),
  aggregate(Val2 ~ ID, dat, mean),
  by = c('ID')
)
  • But I'm wondering if their is a cleaner/shorter way to go about doing this...

Can I do this all in Aggregate???

  • (I didn't see anything in the aggregate code that made it seem like this could work, but I've been wrong before...)

Example #2:

(as requested, using mtcars)
Reduce(function(df1, df2) merge(df1, df2, by = c('cyl','am'), all = T),
    list(
    aggregate(hp ~ cyl + am, mtcars, sum, na.rm = T),
    aggregate(wt ~ cyl + am, mtcars, min), 
    aggregate(qsec ~ cyl + am, mtcars, mean, na.rm = T),
    aggregate(mpg ~ cyl + am, mtcars, mean, na.rm = T)
  )
)

#I'd want a straightforward alternative like:
  aggregate(cbind(hp,wt,qsec,mpg) ~ cyl + am, mtcars, list(sum, min, mean, mean), na.rm = T)

  # ^(I know this doesn't work)

Note: I would prefer a base R approach, but I already realize dplyr or some other package probably does this "better"

like image 682
theforestecologist Avatar asked Nov 29 '25 15:11

theforestecologist


1 Answers

Consider pairwise mapping of columns and functions and then run Map to build a list of aggregated dataframes since aggregate allows string values of function names. Then run a Reduce to merge all dataframe elements together.

cols <- names(dat)[grep("Val", names(dat))]
fcts <- c("mean", "sum")

df_list <- Map(function(c, f) aggregate(.~ID, dat[c("ID", c)], FUN=f), cols, fcts)

final_df <- Reduce(function(x,y) merge(x, y, by="ID"), df_list)

final_df
#   ID Val1 Val2
# 1  a   20  120
# 2  b   50  300
# 3  c   80  480

Be sure columns and functions vectors are same length, possibly needing to repeat functions.

And to demonstrate with mtcars:

cols <- c("hp", "wt", "qsec", "mpg")
fcts <- c("sum", "min", "mean", "mean")

df_list <- Map(function(c, f) aggregate(.~cyl+am, mtcars[c("cyl", "am", c)], FUN=f), cols, fcts)

Reduce(function(x,y) merge(x,y, by=c("cyl", "am")), df_list)

#   cyl am   hp    wt     qsec      mpg
# 1   4  0  254 2.465 20.97000 22.90000
# 2   4  1  655 1.513 18.45000 28.07500
# 3   6  0  461 3.215 19.21500 19.12500
# 4   6  1  395 2.620 16.32667 20.56667
# 5   8  0 2330 3.435 17.14250 15.05000
# 6   8  1  599 3.170 14.55000 15.40000
like image 122
Parfait Avatar answered Dec 02 '25 04:12

Parfait



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!