aggregate() puts multiple output columns in a matrix instead

Tags:

aggregate

I am to compute multiple quantiles for a certain variable:

> res1 <- aggregate(airquality$Wind, list(airquality$Month), function (x) quantile(x, c(0.9, 0.95, 0.975)))
> head(res1)
  Group.1   x.90%   x.95% x.97.5%
1       5 16.6000 17.5000 18.8250
2       6 14.9000 15.5600 17.3650
3       7 14.3000 14.6000 14.9000
4       8 12.6000 14.0500 14.6000
5       9 14.9600 15.5000 15.8025

The result looks good at first, but aggregate actually returns it in a very strange form, where the last 3 columns are not columns of a data.frame, but a single matrix!

Click to copy

> names(res1)
[1] "Group.1" "x"      
> dim(res1)
[1] 5 2
> class(res1[,2])
[1] "matrix"

This causes a lot of problems in further processing.

Few questions:

Why is aggregate() behaving so strange?
Is there any way to persuade it to make the result I expect?
Or am I perhaps using a wrong function for this purpose? Is there any other prefered way to get the wanted result?

Of course I could do some transformation of the output of aggregate(), but I look for some more simple and straightforward solution.

256

asked Jan 14 '15 13:01

1 Answers

Q1: Why is the behavior so strange?

This is actually a documented behavior at ?aggregate (though it may still be unexpected). The relevant argument to look at would be simplify.

If simplify is set to FALSE, aggregate would produce a list instead in a case like this.

Click to copy

res2 <- aggregate(airquality$Wind, list(airquality$Month), function (x) 
  quantile(x, c(0.9, 0.95, 0.975)), simplify = FALSE)
str(res2)
# 'data.frame':  5 obs. of  2 variables:
#  $ Group.1: int  5 6 7 8 9
#  $ x      :List of 5
#   ..$ 1  : Named num  16.6 17.5 18.8
#   .. ..- attr(*, "names")= chr  "90%" "95%" "97.5%"
#   ..$ 32 : Named num  14.9 15.6 17.4
#   .. ..- attr(*, "names")= chr  "90%" "95%" "97.5%"
#   ..$ 62 : Named num  14.3 14.6 14.9
#   .. ..- attr(*, "names")= chr  "90%" "95%" "97.5%"
#   ..$ 93 : Named num  12.6 14.1 14.6
#   .. ..- attr(*, "names")= chr  "90%" "95%" "97.5%"
#   ..$ 124: Named num  15 15.5 15.8
#   .. ..- attr(*, "names")= chr  "90%" "95%" "97.5%"

Now, both a matrix and a list as columns may seem to be strange behavior, but I presume it's more of a case of "status by design" rather than a "bug" or a "flaw".

For instance, consider the following: We want to aggregate both the "Wind" and the "Temp" columns from the "airquality" dataset, and we know that each aggregation would result in multiple columns (like we would expect with quantile).

Click to copy

res3 <- aggregate(cbind(Wind, Temp) ~ Month, airquality, 
                  function (x) quantile(x, c(0.9, 0.95, 0.975)))
res3
#   Month Wind.90% Wind.95% Wind.97.5% Temp.90% Temp.95% Temp.97.5%
# 1     5  16.6000  17.5000    18.8250   74.000   77.500     79.500
# 2     6  14.9000  15.5600    17.3650   87.300   91.100     92.275
# 3     7  14.3000  14.6000    14.9000   89.000   91.500     92.000
# 4     8  12.6000  14.0500    14.6000   94.000   95.000     96.250
# 5     9  14.9600  15.5000    15.8025   91.100   92.550     93.000

In some ways, keeping these values as matrix-columns might make sense--the data aggregated data are easily accessible by their original column names:

Click to copy

res3$Temp
#       90%   95%  97.5%
# [1,] 74.0 77.50 79.500
# [2,] 87.3 91.10 92.275
# [3,] 89.0 91.50 92.000
# [4,] 94.0 95.00 96.250
# [5,] 91.1 92.55 93.000

Q2: How do you get the results as separate columns in a `data.frame`?

But a list as a column is just as awkward to deal with as a matrix as a column in many cases. If you want to "flatten" your matrix into columns, use do.call(data.frame, ...):

Click to copy

do.call(data.frame, res1)
#   Group.1 x.90. x.95. x.97.5.
# 1       5 16.60 17.50 18.8250
# 2       6 14.90 15.56 17.3650
# 3       7 14.30 14.60 14.9000
# 4       8 12.60 14.05 14.6000
# 5       9 14.96 15.50 15.8025
str(.Last.value)
# 'data.frame':  5 obs. of  4 variables:
#  $ Group.1: int  5 6 7 8 9
#  $ x.90.  : num  16.6 14.9 14.3 12.6 15
#  $ x.95.  : num  17.5 15.6 14.6 14.1 15.5
#  $ x.97.5.: num  18.8 17.4 14.9 14.6 15.8a

Q3: Are there other alternatives?

As with most things R, yes of course. My preferred alternative would be to use the "data.table" package, with which you can do:

Click to copy

library(data.table)
as.data.table(airquality)[, as.list(quantile(Wind, c(.9, .95, .975))), 
                          by = Month]
#    Month   90%   95%   97.5%
# 1:     5 16.60 17.50 18.8250
# 2:     6 14.90 15.56 17.3650
# 3:     7 14.30 14.60 14.9000
# 4:     8 12.60 14.05 14.6000
# 5:     9 14.96 15.50 15.8025
str(.Last.value)
# Classes ‘data.table’ and 'data.frame':  5 obs. of  4 variables:
#  $ Month: int  5 6 7 8 9
#  $ 90%  : num  16.6 14.9 14.3 12.6 15
#  $ 95%  : num  17.5 15.6 14.6 14.1 15.5
#  $ 97.5%: num  18.8 17.4 14.9 14.6 15.8
#  - attr(*, ".internal.selfref")=<externalptr>

answered Oct 16 '22 19:10

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                R verify source code
                            
                                How does one turn contour lines into filled contours?
                            
                                change code block color in knitr/markdown
                            
                                Feeding newdata to R predict function
                            
                                Disregarding simple warnings/errors in tryCatch()
                            
                                using scientific notation in R
                            
                                R, filter matrix based on variance cut-offs
                            
                                sprintf format strings: reference by name?
                            
                                R finding rows of a data frame where certain columns match those of another [duplicate]
                            
                                Argument of set.seed in R
                            
                                Numerical derivatives of an arbitrarily defined function
                            
                                "NAs introduced by coercion" during Cluster Analysis in R
                            
                                ggplot2 version 0.9.3.1 won't load into R 3.0.2
                            
                                How to use parameters with RPostgreSQL (to insert data)
                            
                                R barplot: wrapping long text labels?
                            
                                Speeding up the processing of large data frames in R
                            
                                How to add a non-overlapping legend to associate colors with categories in pairs()?
                            
                                How to get multiple ggplot2 scale_fill_gradientn with same scale?
                            
                                how to interpret cca vegan output
                            
                                PCA multiplot in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

aggregate() puts multiple output columns in a matrix instead

Tags:

r

aggregate

Tomas

People also ask

1 Answers

Q1: Why is the behavior so strange?

Q2: How do you get the results as separate columns in a `data.frame`?

Q3: Are there other alternatives?

A5C1D2H2I1M1N2O1R2T1

Recent Activity

Donate For Us

aggregate() puts multiple output columns in a matrix instead

Tags:

r

aggregate

Tomas

People also ask

1 Answers

Q1: Why is the behavior so strange?

Q2: How do you get the results as separate columns in a data.frame?

Q3: Are there other alternatives?

A5C1D2H2I1M1N2O1R2T1

Related questions

Recent Activity

Donate For Us

Q2: How do you get the results as separate columns in a `data.frame`?