I have the following data frame <pre class="prettyprint"><code>x <- read.table(text = " id1 id2 val1 val2 1 a x 1 9 2 a x 2 4 3 a y 3 5 4 a y 4 9 5 b x 1 7 6 b y 4 4 7 b x 3 9 8 b y 2 8", header = TRUE) </code></pre> I want to calculate the mean of val1 and val2 grouped by id1 and id2, and simultaneously count the number of rows for each id1-id2 combination. I can perform each calculation separately: <pre class="prettyprint"><code># calculate mean aggregate(. ~ id1 + id2, data = x, FUN = mean) # count rows aggregate(. ~ id1 + id2, data = x, FUN = length) </code></pre> In order to do both calculations in one call, I tried <pre class="prettyprint"><code>do.call("rbind", aggregate(. ~ id1 + id2, data = x, FUN = function(x) data.frame(m = mean(x), n = length(x)))) </code></pre> However, I get a garbled output along with a warning: <pre class="prettyprint"><code># m n # id1 1 2 # id2 1 1 # 1.5 2 # 2 2 # 3.5 2 # 3 2 # 6.5 2 # 8 2 # 7 2 # 6 2 # Warning message: # In rbind(id1 = c(1L, 2L, 1L, 2L), id2 = c(1L, 1L, 2L, 2L), val1 = list( : # number of columns of result is not a multiple of vector length (arg 1) </code></pre> I could use the plyr package, but my data set is quite large and plyr is very slow (almost unusable) when the size of the dataset grows. How can I use <code>aggregate</code> or other functions to perform several calculations in one call?

You can do it all in one step and get proper labeling: <pre class="prettyprint"><code>> aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) # id1 id2 val1.mn val1.n val2.mn val2.n # 1 a x 1.5 2.0 6.5 2.0 # 2 b x 2.0 2.0 8.0 2.0 # 3 a y 3.5 2.0 7.0 2.0 # 4 b y 3.0 2.0 6.0 2.0 </code></pre> This creates a dataframe with two id columns and two matrix columns: <pre class="prettyprint"><code>str( aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) ) 'data.frame': 4 obs. of 4 variables: $ id1 : Factor w/ 2 levels "a","b": 1 2 1 2 $ id2 : Factor w/ 2 levels "x","y": 1 1 2 2 $ val1: num [1:4, 1:2] 1.5 2 3.5 3 2 2 2 2 ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr "mn" "n" $ val2: num [1:4, 1:2] 6.5 8 7 6 2 2 2 2 ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr "mn" "n" </code></pre> As pointed out by @lord.garbage below, this can be converted to a dataframe with "simple" columns by using <code>do.call(data.frame, ...)</code> <pre class="prettyprint"><code>str( do.call(data.frame, aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) ) ) 'data.frame': 4 obs. of 6 variables: $ id1 : Factor w/ 2 levels "a","b": 1 2 1 2 $ id2 : Factor w/ 2 levels "x","y": 1 1 2 2 $ val1.mn: num 1.5 2 3.5 3 $ val1.n : num 2 2 2 2 $ val2.mn: num 6.5 8 7 6 $ val2.n : num 2 2 2 2 </code></pre> This is the syntax for multiple variables on the LHS: <pre class="prettyprint"><code>aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) </code></pre>

Apply several summary functions on several variables by group in one call

Tags:

r

r-faq

aggregate

I have the following data frame

Click to copy

x <- read.table(text = "  id1 id2 val1 val2 1   a   x    1    9 2   a   x    2    4 3   a   y    3    5 4   a   y    4    9 5   b   x    1    7 6   b   y    4    4 7   b   x    3    9 8   b   y    2    8", header = TRUE)

I want to calculate the mean of val1 and val2 grouped by id1 and id2, and simultaneously count the number of rows for each id1-id2 combination. I can perform each calculation separately:

Click to copy

# calculate mean aggregate(. ~ id1 + id2, data = x, FUN = mean)  # count rows aggregate(. ~ id1 + id2, data = x, FUN = length)

In order to do both calculations in one call, I tried

Click to copy

do.call("rbind", aggregate(. ~ id1 + id2, data = x, FUN = function(x) data.frame(m = mean(x), n = length(x))))

However, I get a garbled output along with a warning:

Click to copy

#     m   n # id1 1   2 # id2 1   1 #     1.5 2 #     2   2 #     3.5 2 #     3   2 #     6.5 2 #     8   2 #     7   2 #     6   2 # Warning message: #   In rbind(id1 = c(1L, 2L, 1L, 2L), id2 = c(1L, 1L, 2L, 2L), val1 = list( : #   number of columns of result is not a multiple of vector length (arg 1)

I could use the plyr package, but my data set is quite large and plyr is very slow (almost unusable) when the size of the dataset grows.

How can I use aggregate or other functions to perform several calculations in one call?

671

asked Aug 21 '12 22:08

broccoli

1 Answers

You can do it all in one step and get proper labeling:

Click to copy

> aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) #   id1 id2 val1.mn val1.n val2.mn val2.n # 1   a   x     1.5    2.0     6.5    2.0 # 2   b   x     2.0    2.0     8.0    2.0 # 3   a   y     3.5    2.0     7.0    2.0 # 4   b   y     3.0    2.0     6.0    2.0

This creates a dataframe with two id columns and two matrix columns:

Click to copy

str( aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) ) 'data.frame':   4 obs. of  4 variables:  $ id1 : Factor w/ 2 levels "a","b": 1 2 1 2  $ id2 : Factor w/ 2 levels "x","y": 1 1 2 2  $ val1: num [1:4, 1:2] 1.5 2 3.5 3 2 2 2 2   ..- attr(*, "dimnames")=List of 2   .. ..$ : NULL   .. ..$ : chr  "mn" "n"  $ val2: num [1:4, 1:2] 6.5 8 7 6 2 2 2 2   ..- attr(*, "dimnames")=List of 2   .. ..$ : NULL   .. ..$ : chr  "mn" "n"

As pointed out by @lord.garbage below, this can be converted to a dataframe with "simple" columns by using do.call(data.frame, ...)

Click to copy

str( do.call(data.frame, aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )      ) 'data.frame':   4 obs. of  6 variables:  $ id1    : Factor w/ 2 levels "a","b": 1 2 1 2  $ id2    : Factor w/ 2 levels "x","y": 1 1 2 2  $ val1.mn: num  1.5 2 3.5 3  $ val1.n : num  2 2 2 2  $ val2.mn: num  6.5 8 7 6  $ val2.n : num  2 2 2 2

This is the syntax for multiple variables on the LHS:

Click to copy

aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )

152

answered Sep 26 '22 03:09

IRTFM

Related questions
                            
                                Regex group capture in R with multiple capture-groups
                            
                                Rename multiple columns by names
                            
                                How to source R Markdown file like `source('myfile.r')`?
                            
                                Why does unlist() kill dates in R?
                            
                                Error: C stack usage is too close to the limit
                            
                                how to increase the limit for max.print in R
                            
                                Choosing between qplot() and ggplot() in ggplot2 [closed]
                            
                                Select / assign to data.table when variable names are stored in a character vector
                            
                                write.table writes unwanted leading empty column to header when has rownames
                            
                                How can I extract plot axes' ranges for a ggplot2 object?
                            
                                Create a variable name with "paste" in R?
                            
                                Parse JSON with R
                            
                                Rstudio rmarkdown: both portrait and landscape layout in a single PDF
                            
                                Quit and restart a clean R session from within R?
                            
                                Shading a kernel density plot between two points.
                            
                                Case Statement Equivalent in R
                            
                                How to display only integer values on an axis using ggplot2
                            
                                What's the fastest way to merge/join data.frames in R?
                            
                                How to specify names of columns for x and y when joining in dplyr?
                            
                                How to create an empty R vector to add new items

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apply several summary functions on several variables by group in one call

Tags:

r

r-faq

aggregate

broccoli

People also ask

1 Answers

IRTFM

Recent Activity

Donate For Us