The documentation says <blockquote> <code>vapply</code> is similar to <code>sapply</code>, but has a pre-specified type of return value, so it can be safer [...] to use. </blockquote> Could you please elaborate as to why it is generally safer, maybe providing examples? <hr> P.S.: I know the answer and I already tend to avoid <code>sapply</code>. I just wish there was a nice answer here on SO so I can point my coworkers to it. Please, no "read the manual" answer.

As has already been noted, <code>vapply</code> does two things: <ul> <li>Slight speed improvement</li> <li>Improves consistency by providing limited return type checks.</li> </ul> The second point is the greater advantage, as it helps catch errors before they happen and leads to more robust code. This return value checking could be done separately by using <code>sapply</code> followed by <code>stopifnot</code> to make sure that the return values are consistent with what you expected, but <code>vapply</code> is a little easier (if more limited, since custom error checking code could check for values within bounds, etc.). Here's an example of <code>vapply</code> ensuring your result is as expected. This parallels something I was just working on while PDF scraping, where <code>findD</code> would use a regex to match a pattern in raw text data (e.g. I'd have a list that was <code>split</code> by entity, and a regex to match addresses within each entity. Occasionally the PDF had been converted out-of-order and there would be two addresses for an entity, which caused badness). <pre class="prettyprint"><code>> input1 <- list( letters[1:5], letters[3:12], letters[c(5,2,4,7,1)] ) > input2 <- list( letters[1:5], letters[3:12], letters[c(2,5,4,7,15,4)] ) > findD <- function(x) x[x=="d"] > sapply(input1, findD ) [1] "d" "d" "d" > sapply(input2, findD ) [[1]] [1] "d" [[2]] [1] "d" [[3]] [1] "d" "d" > vapply(input1, findD, "" ) [1] "d" "d" "d" > vapply(input2, findD, "" ) Error in vapply(input2, findD, "") : values must be length 1, but FUN(X[[3]]) result is length 2 </code></pre> Because two there are two d's in the third element of input2, vapply produces an error. But sapply changes the class of the output from a character vector to a list, which could break code downstream. As I tell my students, part of becoming a programmer is changing your mindset from "errors are annoying" to "errors are my friend." Zero length inputs One related point is that if the input length is zero, <code>sapply</code> will always return an empty list, regardless of the input type. Compare: <pre class="prettyprint"><code>sapply(1:5, identity) ## [1] 1 2 3 4 5 sapply(integer(), identity) ## list() vapply(1:5, identity, integer(1)) ## [1] 1 2 3 4 5 vapply(integer(), identity, integer(1)) ## integer(0) </code></pre> With <code>vapply</code>, you are guaranteed to have a particular type of output, so you don't need to write extra checks for zero length inputs. Benchmarks <code>vapply</code> can be a bit faster because it already knows what format it should be expecting the results in. <pre class="prettyprint"><code>input1.long <- rep(input1,10000) library(microbenchmark) m <- microbenchmark( sapply(input1.long, findD ), vapply(input1.long, findD, "" ) ) library(ggplot2) library(taRifx) # autoplot.microbenchmark is moving to the microbenchmark package in the next release so this should be unnecessary soon autoplot(m) </code></pre> <img src="https://i.stack.imgur.com/pW6qV.png" alt="autoplot">

Why is `vapply` safer than `sapply`?

2 Answers

The extra key strokes involved with vapply could save you time debugging confusing results later. If the function you're calling can return different datatypes, vapply should certainly be used.

One example that comes to mind would be sqlQuery in the RODBC package. If there's an error executing a query, this function returns a character vector with the message. So, for example, say you're trying to iterate over a vector of table names tnames and select the max value from the numeric column 'NumCol' in each table with:

sapply(tnames,     function(tname) sqlQuery(cnxn, paste("SELECT MAX(NumCol) FROM", tname))[[1]])

If all the table names are valid, this would result in a numeric vector. But if one of the table names happens to change in the database and the query fails, the results are going to be coerced into mode character. Using vapply with FUN.VALUE=numeric(1), however, will stop the error here and prevent it from popping up somewhere down the line---or worse, not at all.

answered Sep 20 '22 02:09

Matthew Plourde

As has already been noted, vapply does two things:

Slight speed improvement
Improves consistency by providing limited return type checks.

The second point is the greater advantage, as it helps catch errors before they happen and leads to more robust code. This return value checking could be done separately by using sapply followed by stopifnot to make sure that the return values are consistent with what you expected, but vapply is a little easier (if more limited, since custom error checking code could check for values within bounds, etc.).

Here's an example of vapply ensuring your result is as expected. This parallels something I was just working on while PDF scraping, where findD would use a regex to match a pattern in raw text data (e.g. I'd have a list that was split by entity, and a regex to match addresses within each entity. Occasionally the PDF had been converted out-of-order and there would be two addresses for an entity, which caused badness).

> input1 <- list( letters[1:5], letters[3:12], letters[c(5,2,4,7,1)] ) > input2 <- list( letters[1:5], letters[3:12], letters[c(2,5,4,7,15,4)] ) > findD <- function(x) x[x=="d"] > sapply(input1, findD ) [1] "d" "d" "d" > sapply(input2, findD ) [[1]] [1] "d"  [[2]] [1] "d"  [[3]] [1] "d" "d"  > vapply(input1, findD, "" ) [1] "d" "d" "d" > vapply(input2, findD, "" ) Error in vapply(input2, findD, "") : values must be length 1,  but FUN(X[[3]]) result is length 2

Because two there are two d's in the third element of input2, vapply produces an error. But sapply changes the class of the output from a character vector to a list, which could break code downstream.

As I tell my students, part of becoming a programmer is changing your mindset from "errors are annoying" to "errors are my friend."

Zero length inputs
One related point is that if the input length is zero, sapply will always return an empty list, regardless of the input type. Compare:

sapply(1:5, identity) ## [1] 1 2 3 4 5 sapply(integer(), identity) ## list()     vapply(1:5, identity, integer(1)) ## [1] 1 2 3 4 5 vapply(integer(), identity, integer(1)) ## integer(0)

With vapply, you are guaranteed to have a particular type of output, so you don't need to write extra checks for zero length inputs.

Benchmarks

vapply can be a bit faster because it already knows what format it should be expecting the results in.

input1.long <- rep(input1,10000)  library(microbenchmark) m <- microbenchmark(   sapply(input1.long, findD ),   vapply(input1.long, findD, "" ) ) library(ggplot2) library(taRifx) # autoplot.microbenchmark is moving to the microbenchmark package in the next release so this should be unnecessary soon autoplot(m)

autoplot

177

answered Sep 23 '22 02:09

Ari B. Friedman

Related questions
                            
                                Get coefficients estimated by maximum likelihood into a stargazer table
                            
                                Changing line colors with ggplot()
                            
                                Construct a manual legend for a complicated plot
                            
                                R: what are Slots?
                            
                                Reusing a Model Built in R
                            
                                R Shiny: reactiveValues vs reactive
                            
                                Removing display of row names from data frame
                            
                                Elegant way to report missing values in a data.frame
                            
                                Is there a way to use read.csv to read from a string value rather than a file in R?
                            
                                Change the class from factor to numeric of many columns in a data frame
                            
                                How exactly does R parse `->`, the right-assignment operator?
                            
                                R shiny passing reactive to selectInput choices
                            
                                Extract a column from a data.table as a vector, by position
                            
                                How to set legend alpha with ggplot2
                            
                                How can I build a model to distinguish tweets about Apple (Inc.) from tweets about apple (fruit)?
                            
                                R - do I need to add explicit new line character with print()?
                            
                                R install.packages returns "failed to create lock directory"
                            
                                Plot multiple boxplot in one graph
                            
                                Handling java.lang.OutOfMemoryError when writing to Excel from R
                            
                                What is the difference between parent.frame() and parent.env() in R; how do they differ in call by reference?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is `vapply` safer than `sapply`?

Tags:

r

r-faq

apply

flodel

People also ask

2 Answers

Matthew Plourde

Ari B. Friedman

Recent Activity

Donate For Us