Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Odd behaviour of the by() function in R 3.0.0?

Tags:

r

I am trying to become familiar with the vast universe that constitutes R. There is an excellent function by() which seems to do just what I need, but it doesn't seem to like selection of multiple columns in a data frame.

I used the standard iris dataset, and while it seems well-behaved with a single column selected, it doesn't seem to like the selection of multiple columns. The example is taken from a reference book, but of course there may be a typo.

First version (this works)

> by(iris[,2],Species,mean)
Species: setosa
[1] 3.428
------------------------------------------------------------ 
Species: versicolor
[1] 2.77
------------------------------------------------------------ 
Species: virginica
[1] 2.974

Second version (this doesn't)

> by(iris[,2:3],Species,mean)
Species: setosa
[1] NA
------------------------------------------------------------ 
Species: versicolor
[1] NA
------------------------------------------------------------ 
Species: virginica
[1] NA
Warning messages:
1: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA
3: In mean.default(data[x, , drop = FALSE], ...) :

Any explanations gratefully received.

like image 268
Charles Brewer Avatar asked Oct 11 '13 19:10

Charles Brewer


1 Answers

The message you are getting is not related to the by function but rather to mean.
You passed it a data.frame when mean is expecting a vector.

If instead you were to use a function that work on data.frames, then no warnings are thrown:

by(iris[,2:3],iris$Species, colMeans)
by(iris[,2:3],iris$Species, print)
etc

If you need to, you can nest *ply type functions (eg by, tapply, lapply, etc). Try this for example:

by(iris[,2:3],iris$Species,lapply, mean)

As for mean:

Notice that if you tried to call mean on any data.frame, it would complain:

mean(iris[,2:3])
mean(iris[iris$Species==iris$Species[[1]] ,2:3])

Use colMeans instead

colMeans(iris[iris$Species==iris$Species[[1]] ,2:3])

On an unrelated note: Avoid using attach ;)

like image 125
Ricardo Saporta Avatar answered Oct 14 '22 18:10

Ricardo Saporta