Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get a data.frame from R's aggregate function in the right format?

I'm having trouble getting R's aggregate() function to return a data.frame in the format that I'd like.

Basically I run the aggregation like so:

aggregate(df$res, list(full$depth), summary)

where the res column contains TRUE, FALSE and NA. I want to calculate the number of times each value of res occurs according to the groups in depth, which are six numeric depth values 0, 5, 15, 30, 60 and 100. According to the help page on the aggregate function it coerces the by values to factors, so this oughtn't be a problem (as far as I can tell).

So I run the aggregate function and store it in a data.frame. This is fine; it runs without error. The summary displayed in the R console looks like this:

  Group.1  x.Mode x.FALSE x.TRUE x.NA's
1       0 logical       3     83      0
2       5 logical       3     83      0
3      15 logical       8     78      0
4      30 logical       5     79      2
5      60 logical       1     64     21
6     100 logical       1     24     61

Again, this is fine, and looks like what I want. But the data.frame containing the results actually has only two columns, and looks like this:

    Group.1 x
1   0   logical
2   5   logical
3   15  logical
4   30  logical
5   60  logical
6   100 logical
7       3
8       3
9       8
10      5
11      1
12      1
13      83
14      83
15      78
16      79
17      64
18      24
19      0
20      0
21      0
22      2
23      21
24      61

I understand from the aggregate() help page that:

If the by has names, the non-empty times are used to label the columns in the results, with unnamed grouping variables being named Group.i for by[[i]].

which suggests to me that if the by has names then the output data.frame would look more like the summary of it that gets printed to the R console (i.e. it'd have 5 columns including a column of counts for each level in by) than the two-column version it actually gets saved as. The trouble is that the help page doesn't explain at all what a named by variable is, especially if it's coerced to a list from a data.frame column as in my case.

What do I need to do differently in order for the data.frame that results from aggregate() to have a column of counts for each level of by as the help suggests it could if I knew what I was doing?

like image 711
hendra Avatar asked Feb 14 '14 00:02

hendra


1 Answers

This is because the result of aggregate is fairly odd in this case, where the last column is actually a matrix that has four columns, so the result looks like a 5 column data frame, but it's really a 2 column data frame, where the 2nd column is a 4 wide matrix. Here is a workaround to convert it to a normal data.frame:

X <- aggregate(sample(c(T, F, NA), 100, r=T), list(rep(letters[1:4], 25)), summary)
X <- cbind(X[-ncol(X)], X[[ncol(X)]])
str(X)
# 'data.frame':  4 obs. of  5 variables:
# $ Group.1: chr  "a" "b" "c" "d"
# $ Mode   : Factor w/ 1 level "logical": 1 1 1 1
# $ FALSE  : Factor w/ 4 levels "10","4","6","8": 3 2 4 1
# $ TRUE   : Factor w/ 2 levels "15","8": 2 1 2 2
# $ NA's   : Factor w/ 4 levels "11","6","7","9": 1 2 4 3

The oddness of the result is a function of summary returning a 4 length vector instead of a single value.

like image 62
BrodieG Avatar answered Sep 20 '22 19:09

BrodieG