Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apply() is giving NA values for every column

Tags:

r

apply

I've been having this strange problem with apply lately. Consider the following example:

set.seed(42)
df <- data.frame(cars, foo = sample(LETTERS[1:5], size = nrow(cars), replace = TRUE))
head(df)
  speed dist foo
1     4    2   E
2     4   10   E
3     7    4   B
4     7   22   E
5     8   16   D
6     9   10   C

I want to use apply to apply a function fun (say, mean) on each column of that data.frame. If the data.frame is containing only numeric values, I do not have any problem:

apply(cars, 2, mean)
speed  dist 
15.40 42.98 

But when trying with my data.frame containing numeric and character data, it seem to fail:

apply(df, 2, mean)
speed  dist   foo 
   NA    NA    NA 
Warning messages:
1: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(newX[, i], ..) :
  argument is not numeric or logical: returning NA                 
3: In mean.default(newX[, i], ...) :                              
  argument is not numeric or logical: returning NA

Of course, I was expecting to get NA for the character column, but I would like to get values for the numeric columns anyway.

sapply(df, class)
    speed      dist       foo 
"numeric" "numeric"  "factor" 

Any pointers would be appreciated as I'm feeling like I'm missing something very obvious here!

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
like image 953
Pierre Avatar asked Mar 14 '12 21:03

Pierre


3 Answers

The first sentence of the description for ?apply says:

If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.

Matrices can only be of a single type in R. When the data frame is coerced to a matrix, everything ends up as a character if there is even a single character column.

I guess I owe you an description of an alternative, so here you go. data frames are really just lists, so if you want to apply a function to each column, use lapply or sapply instead.

like image 92
joran Avatar answered Nov 19 '22 03:11

joran


apply works on a matrix, and a matrix must be of all one type. So df is being transformed into a matrix, and since it contains a character, all the columns are becoming character.

> apply(df, 2, class)
      speed        dist         foo 
"character" "character" "character" 

To get what you want, check out the colwise and numcolwise functions in plyr.

> numcolwise(mean)(df)
  speed  dist
1  15.4 42.98
like image 32
Brian Diggs Avatar answered Nov 19 '22 02:11

Brian Diggs


You are applying a function over the columns of a data.frame. Since a data.frame is a list, you can use lapply or sapply instead of apply:

sapply(df, mean)

speed  dist   foo 
15.40 42.98    NA 
Warning message:
In mean.default(X[[3L]], ...) :
  argument is not numeric or logical: returning NA

And you can remove the warning message by using an anonymous function that tests for class numeric before calculating the mean:

sapply(df, function(x)ifelse(is.numeric(x), mean(x), NA))

speed  dist   foo 
15.40 42.98    NA 
like image 3
Andrie Avatar answered Nov 19 '22 01:11

Andrie