Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

by() giving error when applying mean function over a data frame. What's happening?

Tags:

r

na

mean

I am trying to learn by() in R(3.0.1) . This is what I am doing.

  1. Open R
  2. attach(iris)
  3. head(iris)
  4. by(iris[,1:4] , Species , mean)

This is what I am getting

> by(iris[,1:4] , Species , mean)

Species: setosa
[1] NA
------------------------------------------------------------ 
Species: versicolor
[1] NA
------------------------------------------------------------ 
Species: virginica
[1] NA
Warning messages:
1: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA

2: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA

3: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA
like image 600
lovekesh Avatar asked Jan 13 '14 19:01

lovekesh


1 Answers

The problem here is that the function you are applying doesn't work on a data frame. In effect you are calling something like this

R> mean(iris[iris$Species == "setosa", 1:4])
[1] NA
Warning message:
In mean.default(iris[iris$Species == "setosa", 1:4]) :
  argument is not numeric or logical: returning NA

i.e. you are passing a data frame of 4 columns, containing the rows of the original where Species == "setosa".

For by() you need to do this variable by variable, as in

R> by(iris[,1] , iris$Species , mean)
iris$Species: setosa
[1] 5.006
------------------------------------------------------------ 
iris$Species: versicolor
[1] 5.936
------------------------------------------------------------ 
iris$Species: virginica
[1] 6.588

Or use colMeans() instead of mean() as the FUN applied

R> by(iris[,1:4] , iris$Species , colMeans)
iris$Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.006        3.428        1.462        0.246 
------------------------------------------------------------ 
iris$Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.936        2.770        4.260        1.326 
------------------------------------------------------------ 
iris$Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       6.588        2.974        5.552        2.026

If a canned function like colMeans() doesn't exist, then you can always write a wrapper, to sapply() eg

foo <- function(x, ...) sapply(x, mean, ...)
by(iris[, 1:4], iris$Species, foo)

R> by(iris[, 1:4], iris$Species, foo)
iris$Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.006        3.428        1.462        0.246 
------------------------------------------------------------ 
iris$Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.936        2.770        4.260        1.326 
------------------------------------------------------------ 
iris$Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       6.588        2.974        5.552        2.026

You might find aggregate() more appealing:

R> with(iris, aggregate(iris[,1:4], list(Species = Species), FUN = mean))
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026

Notice how I use with() to access Species directly; this is much better than attaching() iris if you don't want to index via iris$Species.

like image 177
Gavin Simpson Avatar answered Sep 30 '22 01:09

Gavin Simpson