I've been having this strange problem with apply
lately. Consider the following example:
set.seed(42)
df <- data.frame(cars, foo = sample(LETTERS[1:5], size = nrow(cars), replace = TRUE))
head(df)
speed dist foo
1 4 2 E
2 4 10 E
3 7 4 B
4 7 22 E
5 8 16 D
6 9 10 C
I want to use apply
to apply a function fun
(say, mean
) on each column of that data.frame
. If the data.frame
is containing only numeric
values, I do not have any problem:
apply(cars, 2, mean)
speed dist
15.40 42.98
But when trying with my data.frame
containing numeric
and character
data, it seem to fail:
apply(df, 2, mean)
speed dist foo
NA NA NA
Warning messages:
1: In mean.default(newX[, i], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(newX[, i], ..) :
argument is not numeric or logical: returning NA
3: In mean.default(newX[, i], ...) :
argument is not numeric or logical: returning NA
Of course, I was expecting to get NA
for the character
column, but I would like to get values for the numeric
columns anyway.
sapply(df, class)
speed dist foo
"numeric" "numeric" "factor"
Any pointers would be appreciated as I'm feeling like I'm missing something very obvious here!
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
The first sentence of the description for ?apply
says:
If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.
Matrices can only be of a single type in R. When the data frame is coerced to a matrix, everything ends up as a character if there is even a single character column.
I guess I owe you an description of an alternative, so here you go. data frames are really just lists, so if you want to apply a function to each column, use lapply
or sapply
instead.
apply
works on a matrix, and a matrix must be of all one type. So df
is being transformed into a matrix, and since it contains a character, all the columns are becoming character.
> apply(df, 2, class)
speed dist foo
"character" "character" "character"
To get what you want, check out the colwise
and numcolwise
functions in plyr
.
> numcolwise(mean)(df)
speed dist
1 15.4 42.98
You are applying a function over the columns of a data.frame. Since a data.frame is a list, you can use lapply
or sapply
instead of apply
:
sapply(df, mean)
speed dist foo
15.40 42.98 NA
Warning message:
In mean.default(X[[3L]], ...) :
argument is not numeric or logical: returning NA
And you can remove the warning message by using an anonymous function that tests for class numeric before calculating the mean:
sapply(df, function(x)ifelse(is.numeric(x), mean(x), NA))
speed dist foo
15.40 42.98 NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With