What are the advantages of the "apply" functions? When are they better to use than "for" loops, and when are they not? [duplicate]

Tags:

Possible Duplicate:
Is R's apply family more than syntactic sugar

Just what the title says. Stupid question, perhaps, but my understanding has been that when using an "apply" function, the iteration is performed in compiled code rather than in the R parser. This would seem to imply that lapply, for instance, is only faster than a "for" loop if there are a great many iterations and each operation is relatively simple. For instance, if a single call to a function wrapped up in lapply takes 10 seconds, and there are only, say, 12 iterations of it, I would imagine that there's virtually no difference at all between using "for" and "lapply".

Now that I think of it, if the function inside the "lapply" has to be parsed anyway, why should there be ANY performance benefit from using "lapply" instead of "for" unless you're doing something that there are compiled functions for (like summing or multiplying, etc)?

Thanks in advance!

Josh

839

asked Jun 23 '11 21:06

Josh

2 Answers

There are several reasons why one might prefer an apply family function over a for loop, or vice-versa.

Firstly, for() and apply(), sapply() will generally be just as quick as each other if executed correctly. lapply() does more of it's operating in compiled code within the R internals than the others, so can be faster than those functions. It appears the speed advantage is greatest when the act of "looping" over the data is a significant part of the compute time; in many general day-to-day uses you are unlikely to gain much from the inherently quicker lapply(). In the end, these all will be calling R functions so they need to be interpreted and then run.

for() loops can often be easier to implement, especially if you come from a programming background where loops are prevalent. Working in a loop may be more natural than forcing the iterative computation into one of the apply family functions. However, to use for() loops properly, you need to do some extra work to set-up storage and manage plugging the output of the loop back together again. The apply functions do this for you automagically. E.g.:

IN <- runif(10)
OUT <- logical(length = length(IN))
for(i in IN) {
    OUT[i] <- IN > 0.5
}

that is a silly example as > is a vectorised operator but I wanted something to make a point, namely that you have to manage the output. The main thing is that with for() loops, you always allocate sufficient storage to hold the outputs before you start the loop. If you don't know how much storage you will need, then allocate a reasonable chunk of storage, and then in the loop check if you have exhausted that storage, and bolt on another big chunk of storage.

The main reason, in my mind, for using one of the apply family of functions is for more elegant, readable code. Rather than managing the output storage and setting up the loop (as shown above) we can let R handle that and succinctly ask R to run a function on subsets of our data. Speed usually does not enter into the decision, for me at least. I use the function that suits the situation best and will result in simple, easy to understand code, because I'm far more likely to waste more time than I save by always choosing the fastest function if I can't remember what the code is doing a day or a week or more later!

The apply family lend themselves to scalar or vector operations. A for() loop will often lend itself to doing multiple iterated operations using the same index i. For example, I have written code that uses for() loops to do k-fold or bootstrap cross-validation on objects. I probably would never entertain doing that with one of the apply family as each CV iteration needs multiple operations, access to lots of objects in the current frame, and fills in several output objects that hold the output of the iterations.

As to the last point, about why lapply() can possibly be faster that for() or apply(), you need to realise that the "loop" can be performed in interpreted R code or in compiled code. Yes, both will still be calling R functions that need to be interpreted, but if you are doing the looping and calling directly from compiled C code (e.g. lapply()) then that is where the performance gain can come from over apply() say which boils down to a for() loop in actual R code. See the source for apply() to see that it is a wrapper around a for() loop, and then look at the code for lapply(), which is:

> lapply
function (X, FUN, ...) 
{
    FUN <- match.fun(FUN)
    if (!is.vector(X) || is.object(X)) 
        X <- as.list(X)
    .Internal(lapply(X, FUN))
}
<environment: namespace:base>

and you should see why there can be a difference in speed between lapply() and for() and the other apply family functions. The .Internal() is one of R's ways of calling compiled C code used by R itself. Apart from a manipulation, and a sanity check on FUN, the entire computation is done in C, calling the R function FUN. Compare that with the source for apply().

answered Sep 29 '22 14:09

Gavin Simpson

From Burns' R Inferno (pdf), p25:

Use an explicit for loop when each iteration is a non-trivial task. But a simple loop can be more clearly and compactly expressed using an apply function. There is at least one exception to this rule ... if the result will be a list and some of the components can be NULL, then a for loop is trouble (big trouble) and lapply gives the expected answer.

answered Sep 29 '22 14:09

Richie Cotton

Related questions
                            
                                How to parse a json string response using Delphi
                            
                                how to mock json.parse() in jest tests
                            
                                What's the difference between the different XML parsing libraries in PHP5?
                            
                                How to get surrounding method in Java source file for a given line number
                            
                                A JavaScript parser for DOM
                            
                                Using ANTLR to analyze and modify source code; am I doing it wrong?
                            
                                How to prevent Gson serialize / deserialize the first character of a field (underscore)?
                            
                                javascript parseInt to remove spaces from a string
                            
                                best java Xml parser to manipulate/edit an existing xml document
                            
                                Example for LL(1) Grammar which is NOT LALR?
                            
                                Why is an anonymous function on its own a syntax error in javascript?
                            
                                Perl - Parse URL to get a GET Parameter Value
                            
                                Inline external CSS with HTML
                            
                                How to remove trailing comments via regexp?
                            
                                Parsing an equation with custom functions in Python
                            
                                Bison one or more occurrences in grammar file
                            
                                Parsing html using Selenium - class name contains spaces
                            
                                ANTLR AST rules fail with RewriteEmptyStreamException
                            
                                Wikipedia : Java library to remove wikipedia text markup removal
                            
                                How can I escape single or double quotation marks in CSS?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the advantages of the "apply" functions? When are they better to use than "for" loops, and when are they not? [duplicate]

Tags:

for-loop

parsing

r

compilation

lapply

Josh

People also ask

2 Answers

Gavin Simpson

Richie Cotton

Recent Activity

Donate For Us