Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aggregate prints incorrect number of columns

I used the aggregate function to get the range by factor level. I am trying to rename the columns, but the output from the aggregate function does not have the min and max as separate columns.

# example data
size_cor <- data.frame(SpCode = rep(c(200, 400, 401), 3),
                       Length = c(45, 23, 56, 89, 52, 85, 56, 45, 78))

# aggregate function
spcode_range <- with(size_cor, aggregate(Length, list(SpCode), FUN = range))

Output:

spcode_range 

  Group.1 x.1 x.2
1     200  45  89
2     400  23  52
3     401  56  85

Data structure:

str(spcode_range)

'data.frame':   3 obs. of  2 variables:
 $ Group.1: num  200 400 401
 $ x      : num [1:3, 1:2] 45 23 56 89 52 85

dim(spcode_range)
[1] 3 2

The output has three columns: Group.1, x.1 (min) and x.2 (max), but the dataframe has only 2 columns. I have tried setNames, rename and name with no success because I am trying to name three columns when R has only 2 columns.

like image 481
user41509 Avatar asked Nov 10 '22 08:11

user41509


1 Answers

Basically what happened here is that you've called the range function by group which returned two values at a time. The aggregate function returned a data.frame (which it always does unless the data set is a ts class) with those values as a matrix in a single column (of class matrix obviously).

Then, when you print it, it triggers the print.data.frame method which in turn calls format.data.frame which converts each column in the matrix column into a separate column (see str(format.data.frame(spcode_range))) and then, the printed result is actually not the actual data.frame you are trying to print (don't ask me why, probably for convenience - as it is not clear how to print a matrix within a data.frame).

So basically, one way to fix this is to combine do.call and cbind.data.frame, e.g.

res <- do.call(cbind.data.frame, aggregate(Length ~ SpCode, size_cor, range))
str(res)
# 'data.frame': 3 obs. of  3 variables:
# $ SpCode  : num  200 400 401
# $ Length.1: num  45 23 56
# $ Length.2: num  89 52 85

Or just use other packages such dplyr or data.table which were designed to (among other stuff) replace/improve data manipulation operations in R.

like image 54
David Arenburg Avatar answered Nov 15 '22 07:11

David Arenburg