Could someone please explain the differences between how apply()
and sapply()
operate on the columns of a data frame?
For example, when attempting to find the class of each column in a data frame, my first inclination is to use apply
on the columns:
> apply(iris, 2, class)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"character" "character" "character" "character" "character"
This is not correct, however, as some of the columns are numeric:
> class(iris$Petal.Length)
[1] "numeric"
A quick search on Google turned up this solution for the problem which uses sapply
instead of apply
:
> sapply(iris, class)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"numeric" "numeric" "numeric" "numeric" "factor"
In this case, sapply
is implicitly converting iris
to a list, and then applying the function to each entry in the list, e.g.:
> class(as.list(iris)$Petal.Length)
[1] "numeric"
What I'm still unclear about is why my original attempt using apply
didn't work.
Difference between the apply() and sapply() Functions:The apply() function takes the data frame and a matrix as the input, whereas the sapply() function takes the data frame, vector, and list as the input. The lapply() function also takes the same input as the sapply() function.
Difference between lapply() and sapply() functions:lapply() function displays the output as a list whereas sapply() function displays the output as a vector. lapply() and sapply() functions are used to perform some operations in a list of objects.
The difference between lapply() and apply() lies between the output return. The output of lapply() is a list. lapply() can be used for other objects like data frames and lists. lapply() function does not need MARGIN.
The apply() function lets us apply a function to the rows or columns of a matrix or data frame. This function takes matrix or data frame as an argument along with function and whether it has to be applied by row or column and returns the result in the form of a vector or array or list of values obtained.
As often seems to be the case, I figured out the answer to my question in process of writing it up. Posting the answer here in case anyone else has the same question.
Taking a closer look at ?apply
states:
If ‘X’ is not an array but an object of a class with a non-null ‘dim’ value (such as a data frame), ‘apply’ attempts to coerce it to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data frame) or via ‘as.array’.
So just like sapply
casts the data frame to a list
before operating on it, apply
casts the data frame to a matrix
. Since matrices cannot have mixed types and there is at least one column with non-numeric data (Species
), then everything becomes character data:
> class(as.matrix(iris)[,'Petal.Length'])
[1] "character"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With