Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does a "standard formula interface to a data.frame" mean in R?

Tags:

r

aggregate

The documentation for aggregate states:

‘aggregate.formula’ is a standard formula interface to ‘aggregate.data.frame’.

I am new to R, and I don't understand what this means. Please explain!

Thanks!

Uri

like image 504
Uri Laserson Avatar asked Sep 16 '11 21:09

Uri Laserson


1 Answers

Jump to the middle of the examples section of help(aggregate) and you will see this:

 ## Formulas, one ~ one, one ~ many, many ~ one, and many ~ many:
 aggregate(weight ~ feed, data = chickwts, mean)
 aggregate(breaks ~ wool + tension, data = warpbreaks, mean)
 aggregate(cbind(Ozone, Temp) ~ Month, data = airquality, mean)
 aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, sum)

Four different calls to aggregate(), all using the formula interface. The way it is written above in what you quote has to do with method dispatching mechanism used throughout R.

Consider the first example:

R> class(weight ~ feed)
[1] "formula"
R> class(chickwts)
[1] "data.frame"

so aggregate dispatches on it first argument (of class formula). The way a formula gets resolved in R typically revolves around a model.matrix, I presume something similar happens here and an equivalent call is eventually execucted by aggregate.data.frame, using the second argument chickwts, a data.frame.

R> aggregate(weight ~ feed, data = chickwts, mean)
       feed  weight
1    casein 323.583
2 horsebean 160.200
3   linseed 218.750
4  meatmeal 276.909
5   soybean 246.429
6 sunflower 328.917
R> 

What you asked isn't the easiest beginner question, I'd recommend a good thorough look at some of the documentation and a decent R book if you have one handy. (And other SO questions give recommendation as to what to read next.)

Edit: I had to dig a little as aggregate.formula() is not exported from stats namespace, but you can look at it by typing stats:::aggregate.formula at the prompt -- which then clearly shows that it does, in fact, dispatch to aggregate.data.frame():

 [.... some code omitted ...]
    if (is.matrix(mf[[1L]])) {
        lhs <- as.data.frame(mf[[1L]])
        names(lhs) <- as.character(m[[2L]][[2L]])[-1L]
        aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...)
    }
    else aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...)
}
<environment: namespace:stats>
R> 
like image 104
Dirk Eddelbuettel Avatar answered Sep 22 '22 14:09

Dirk Eddelbuettel