Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between apply and sapply for data frame columns?

Could someone please explain the differences between how apply() and sapply() operate on the columns of a data frame?

For example, when attempting to find the class of each column in a data frame, my first inclination is to use apply on the columns:

> apply(iris, 2, class)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
 "character"  "character"  "character"  "character"  "character" 

This is not correct, however, as some of the columns are numeric:

> class(iris$Petal.Length)
[1] "numeric"

A quick search on Google turned up this solution for the problem which uses sapply instead of apply:

> sapply(iris, class)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
   "numeric"    "numeric"    "numeric"    "numeric"     "factor"

In this case, sapply is implicitly converting iris to a list, and then applying the function to each entry in the list, e.g.:

> class(as.list(iris)$Petal.Length)
[1] "numeric"

What I'm still unclear about is why my original attempt using apply didn't work.

like image 760
Keith Hughitt Avatar asked Aug 23 '16 15:08

Keith Hughitt


People also ask

What is the difference between Sapply and apply?

Difference between the apply() and sapply() Functions:The apply() function takes the data frame and a matrix as the input, whereas the sapply() function takes the data frame, vector, and list as the input. The lapply() function also takes the same input as the sapply() function.

What is the difference between Lapply () and Sapply () functions?

Difference between lapply() and sapply() functions:lapply() function displays the output as a list whereas sapply() function displays the output as a vector. lapply() and sapply() functions are used to perform some operations in a list of objects.

What is the difference between apply and Lapply in R?

The difference between lapply() and apply() lies between the output return. The output of lapply() is a list. lapply() can be used for other objects like data frames and lists. lapply() function does not need MARGIN.

What does apply () do in R?

The apply() function lets us apply a function to the rows or columns of a matrix or data frame. This function takes matrix or data frame as an argument along with function and whether it has to be applied by row or column and returns the result in the form of a vector or array or list of values obtained.


1 Answers

As often seems to be the case, I figured out the answer to my question in process of writing it up. Posting the answer here in case anyone else has the same question.

Taking a closer look at ?apply states:

If ‘X’ is not an array but an object of a class with a non-null ‘dim’ value (such as a data frame), ‘apply’ attempts to coerce it to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data frame) or via ‘as.array’.

So just like sapply casts the data frame to a list before operating on it, apply casts the data frame to a matrix. Since matrices cannot have mixed types and there is at least one column with non-numeric data (Species), then everything becomes character data:

> class(as.matrix(iris)[,'Petal.Length'])
[1] "character"
like image 82
Keith Hughitt Avatar answered Sep 30 '22 17:09

Keith Hughitt