I am trying to learn by()
in R(3.0.1) .
This is what I am doing.
attach(iris)
head(iris)
by(iris[,1:4] , Species , mean)
This is what I am getting
> by(iris[,1:4] , Species , mean)
Species: setosa
[1] NA
------------------------------------------------------------
Species: versicolor
[1] NA
------------------------------------------------------------
Species: virginica
[1] NA
Warning messages:
1: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(data[x, , drop = FALSE], ...) :
argument is not numeric or logical: returning NA
The problem here is that the function you are applying doesn't work on a data frame. In effect you are calling something like this
R> mean(iris[iris$Species == "setosa", 1:4])
[1] NA
Warning message:
In mean.default(iris[iris$Species == "setosa", 1:4]) :
argument is not numeric or logical: returning NA
i.e. you are passing a data frame of 4 columns, containing the rows of the original where Species == "setosa"
.
For by()
you need to do this variable by variable, as in
R> by(iris[,1] , iris$Species , mean)
iris$Species: setosa
[1] 5.006
------------------------------------------------------------
iris$Species: versicolor
[1] 5.936
------------------------------------------------------------
iris$Species: virginica
[1] 6.588
Or use colMeans()
instead of mean()
as the FUN
applied
R> by(iris[,1:4] , iris$Species , colMeans)
iris$Species: setosa
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.006 3.428 1.462 0.246
------------------------------------------------------------
iris$Species: versicolor
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.936 2.770 4.260 1.326
------------------------------------------------------------
iris$Species: virginica
Sepal.Length Sepal.Width Petal.Length Petal.Width
6.588 2.974 5.552 2.026
If a canned function like colMeans()
doesn't exist, then you can always write a wrapper, to sapply()
eg
foo <- function(x, ...) sapply(x, mean, ...)
by(iris[, 1:4], iris$Species, foo)
R> by(iris[, 1:4], iris$Species, foo)
iris$Species: setosa
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.006 3.428 1.462 0.246
------------------------------------------------------------
iris$Species: versicolor
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.936 2.770 4.260 1.326
------------------------------------------------------------
iris$Species: virginica
Sepal.Length Sepal.Width Petal.Length Petal.Width
6.588 2.974 5.552 2.026
You might find aggregate()
more appealing:
R> with(iris, aggregate(iris[,1:4], list(Species = Species), FUN = mean))
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 5.006 3.428 1.462 0.246
2 versicolor 5.936 2.770 4.260 1.326
3 virginica 6.588 2.974 5.552 2.026
Notice how I use with()
to access Species
directly; this is much better than attaching()
iris
if you don't want to index via iris$Species
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With