Preventing performance regressions in R

Tags:

What is a good workflow for detecting performance regressions in R packages? Ideally, I'm looking for something that integrates with R CMD check that alerts me when I have introduced a significant performance regression in my code.

What is a good workflow in general? What other languages provide good tools? Is it something that can be built on top unit testing, or that is usually done separately?

260

asked Dec 11 '11 15:12

hadley

1 Answers

This is a very challenging question, and one that I'm frequently dealing with, as I swap out different code in a package to speed things up. Sometimes a performance regression comes along with a change in algorithms or implementation, but it may also arise due to changes in the data structures used.

What is a good workflow for detecting performance regressions in R packages?

In my case, I tend to have very specific use cases that I'm trying to speed up, with different fixed data sets. As Spacedman wrote, it's important to have a fixed computing system, but that's almost infeasible: sometimes a shared computer may have other processes that slow things down 10-20%, even when it looks quite idle.

My steps:

Standardize the platform (e.g. one or a few machines, a particular virtual machine, or a virtual machine + specific infrastructure, a la Amazon's EC2 instance types).
Standardize the data set that will be used for speed testing.
Create scripts and fixed intermediate data output (i.e. saved to .rdat files) that involve very minimal data transformations. My focus is on some kind of modeling, rather than data manipulation or transformation. This means that I want to give exactly the same block of data to the modeling functions. If, however, data transformation is the goal, then be sure that the pre-transformed/manipulated data is as close as possible to standard across tests of different versions of the package. (See this question for examples of memoization, cacheing, etc., that can be used to standardize or speed up non-focal computations. It references several packages by the OP.)
Repeat tests multiple times.
Scale the results relative to fixed benchmarks, e.g. the time to perform a linear regression, to sort a matrix, etc. This can allow for "local" or transient variations in infrastructure, such as may be due to I/O, the memory system, dependent packages, etc.
Examine the profiling output as vigorously as possible (see this question for some insights, also referencing tools from the OP).

Ideally, I'm looking for something that integrates with R CMD check that alerts me when I have introduced a significant performance regression in my code.

Unfortunately, I don't have an answer for this.

What is a good workflow in general?

For me, it's quite similar to general dynamic code testing: is the output (execution time in this case) reproducible, optimal, and transparent? Transparency comes from understanding what affects the overall time. This is where Mike Dunlavey's suggestions are important, but I prefer to go further, with a line profiler.

Regarding a line profiler, see my previous question, which refers to options in Python and Matlab for other examples. It's most important to examine clock time, but also very important to track memory allocation, number of times the line is executed, and call stack depth.

What other languages provide good tools?

Almost all other languages have better tools. :) Interpreted languages like Python and Matlab have the good & possibly familiar examples of tools that can be adapted for this purpose. Although dynamic analysis is very important, static analysis can help identify where there may be some serious problems. Matlab has a great static analyzer that can report when objects (e.g. vectors, matrices) are growing inside of loops, for instance. It is terrible to find this only via dynamic analysis - you've already wasted execution time to discover something like this, and it's not always discernible if your execution context is pretty simple (e.g. just a few iterations, or small objects).

As far as language-agnostic methods, you can look at:
1. Valgrind & cachegrind
2. Monitoring of disk I/O, dirty buffers, etc.
3. Monitoring of RAM (Cachegrind is helpful, but you could just monitor RAM allocation, and lots of details about RAM usage)
4. Usage of multiple cores
Is it something that can be built on top unit testing, or that is usually done separately?

This is hard to answer. For static analysis, it can occur before unit testing. For dynamic analysis, one may want to add more tests. Think of it as sequential design (i.e. from an experimental design framework): if the execution costs appear to be, within some statistical allowances for variation, the same, then no further tests are needed. If, however, method B seems to have an average execution cost greater than method A, then one should perform more intensive tests.

Update 1: If I may be so bold, there's another question that I'd recommend including, which is: "What are some gotchas in comparing the execution time of two versions of a package?" This is analogous to assuming that two programs that implement the same algorithm should have the same intermediate objects. That's not exactly true (see this question - not that I'm promoting my own questions, here - it's just hard work to make things better and faster...leading to multiple SO questions on this topic :)). In a similar way, two executions of the same code can differ in time consumed due to factors other than the implementation.

So, some gotchas that can occur, either within the same language or across languages, within the same execution instance or across "identical" instances, which can affect runtime:

Garbage collection - different implementations or languages can hit garbage collection under different circumstances. This can make two executions appear different, though it can be very dependent on context, parameters, data sets, etc. The GC-obsessive execution will look slower.
Cacheing at the level of the disk, motherboard (e.g. L1, L2, L3 caches), or other levels (e.g. memoization). Often, the first execution will pay a penalty.
Dynamic voltage scaling - This one sucks. When there is a problem, this may be one of the hardest beasties to find, since it can go away quickly. It looks like cacheing, but it isn't.
Any job priority manager that you don't know about.
One method uses multiple cores or does some clever stuff about how work is parceled among cores or CPUs. For instance, getting a process locked to a core can be useful in some scenarios. One execution of an R package may be luckier in this regard, another package may be very clever...
Unused variables, excessive data transfer, dirty caches, unflushed buffers, ... the list goes on.

The key result is: Ideally, how should we test for differences in expected values, subject to the randomness created due to order effects? Well, pretty simple: go back to experimental design. :)

When the empirical differences in execution times are different from the "expected" differences, it's great to have enabled additional system and execution monitoring so that we don't have to re-run the experiments until we're blue in the face.

answered Oct 04 '22 18:10

Iterator

Related questions
                            
                                Fast way to replace elements in array - C
                            
                                Fastest way to iterate an Array in Java: loop variable vs enhanced for statement [duplicate]
                            
                                How to pass values across the pages in ASP.net without using Session
                            
                                What is Azul "Zing"? [closed]
                            
                                C++ 11 auto compile time or runtime?
                            
                                Performance of C++ vs Virtual Machine languages in high frequency finance
                            
                                Why does n++ execute faster than n=n+1?
                            
                                x=x+1 vs. x +=1
                            
                                Difference in performance of compiled accelerate code ran from ghci and shell
                            
                                Excessive mysterious system time use in a GHC-compiled binary
                            
                                Why is string.intern() so slow?
                            
                                AppFabric Caching - Proper use of DataCacheFactory and DataCache
                            
                                Is $(document).ready necessary if I put all my JavaScript at the bottom of the page? [duplicate]
                            
                                Why is float() faster than int()?
                            
                                How do I implement threaded comments?
                            
                                Why does DbSet.Add work so slow?
                            
                                Is it possible to force an existing Java application to use no more than x cores?
                            
                                How does .NET make use of IO Threads or IO Completion Ports?
                            
                                Why does breaking the "output dependency" of LZCNT matter?
                            
                                What is the performance impact of Scala implicit type conversions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Preventing performance regressions in R

Tags:

performance

r

testing

hadley

People also ask

1 Answers

Iterator

Recent Activity

Donate For Us