Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What can cause a program to run much faster the second time?

People also ask

How can 2 programs run at the same time?

To open a second window of certain open apps, just hold Shift and click on the icon in your taskbar. For programs like Word, Notepad, File Explorer, and Chrome, this will open a second window with a blank document. You can work in that instance of the app separately from whatever else you already have open.


Three things to try:

  • Run it in a sampling profiler, including a "cold" run (first thing after a reboot). Should usually be enough.
  • Check memory usage, does it grow so high (even transiently) the OS would have to swap things out of RAM to make room for your app? That alone could be an explanation for what you're seeing. Also look at the amount of free RAM you have when you start your app.
  • Enable system performance tools and check the I/O counters or file accesses, and make sure under FileMon / Process Explorer that you don't have some file or network accesses you've forgotten about (leftover log/test code)

Even (especially) for very small command-line program, the issue can be the time it takes to load the process, link to dynamically-linked libraries etc. I believe modern operating systems avoid repeating a lot of this work if the same program is run twice at once, or repeatedly.

I wouldn't dismiss CPU cache so easily, as well. Level 0 cache is very relevant for inner loops, but much less so for a second run of the same application. On my cheap Athlon 2 X4 645 system, there's 64K + 64K (data + instruction) level 0 cache per core - not exactly a huge amount of memory. Level 1 cache is IIRC 512K per core, so less likely to be dirtied to complete irrelevance by the O/S code needed to start up a new run of the program, calls to operating system services and standard libraries, etc. Level 2 cache (on CPUs that have it - my Athlon 2 doesn't, IIRC) is larger still, and there may be some even higher level and larger cache provided by the motherboard/chipset.

There's at least one other kind of cache - branch prediction tables. Though I'd have thought they'd be dirtied to irrelevance even quicker than the level 0 cache.

I generally find that unit test programs run many times slower the first time. However, the larger and more complex the program, the less significant the effect.

For some time now, performance of applications has often been considered non-deterministic. Although it isn't strictly true, the performance is determined by so many hard-to-predict factors that it's a good model. For example, if the CPU is a bit warm, the clock speed may be reduced to prevent overheating. And the temperature varies at different parts of the chip, with changes conducting across the chip in complex ways. As changes in clock speed and the different demands of different pieces of code alter the patterns of changing temperature, there's a clear potential for chaotic (as in chaos theory) behaviour.

On some platforms, I wouldn't be surprised if the first run of the program got the processor to run if it's "fast" (rather than cool/quiet) mode, and that meant that the beginning of the second run benefitted from that speed boost as well as the end. However, this would be a tricky one - it would have to be a CPU-intensive program, and if your cooling is inadequate, the processor may then slow down again to avoid overheating.


I'd guess it's all your libraries/DLLs. These are usually loaded on-demand at run-time, so the first time your program runs the OS will have to read them all from disk. Once read, though, they'll stay loaded unless your system starts running low on memory. So if you run the same program several times in succession, the first run takes the brunt of the load time, and the other runs benefit from the pre-loaded libraries.


I usually experienced the contrary: for computation intensitive work (if anti virus is not working), I only have a 5-10% diff between calls. For instance, the 6,000,000 regression tests run for our framework have a very constant time of running, and it's very disk and CPU intensive work.

I really don't believe of a CPU cache or pipelining / branch prediction issue either, since both processed data and code seem to be consistent, as you wrote. If anti virus is off, it may be about OS thread settings: did you try to change the process CPU affinity and priority?

This should be very specific to the process you are running. Without any actual source code to reproduce it, it's almost impossible to tell what's happening with you. How many threads are there? What is the HW configuration (isn't there any Intel CPU boost there - are you using a laptop, and what are your energy settings)? Is it using CPU/FPU/MMX/SSE2 (e.g. MMX and FPU do not mix)? Does it move a lot of data, or process some existing data? Does your SW depends on external libraries (even some Windows libraries may need some time to initialize)? How do you use memory (did you try to pre-allocate the memory; or on a multi-threaded application, did you try using a scaling MM instead of FastMM4)?

I think using a sample profiler may not help so much, since it will change the general CPU core use, but it's worth trying in all cases. I'd better rely on logging profiling - see e.g. this class or you may write your own timestamps to find where the timing changes in your app.

AFAIK it has always been written that, when benchmarking, the first run of an application shall never be taken in account. Computer systems are so complex nowadays, that the first time, all the internal (SW and HW) plumbing is to be purged - so you shall not drink the first water coming out of your tap when you come back from 1 month of travel. ;)


Other factors I can think of would be memory-alignment (and the subsequent cache line fills), but say there are 2 types : perfect alignment (being fastest) and imperfect (being slower), one would expect it to occur irregularly (depending on how memory is laid out).

Perhaps it has something to do with physical page layout? As far as I know, each memory-access goes through the MMU page table entries, so dispersed physical pages could be slower than consecutive pages. (Just a wild guess, this one)

Another thing I haven't seen mentioned yet, is on which core(s) your process is running - especially on hyper-threaded CPU's, running on the slower of the two cores might have a negative impact. Try setting the processor affinity mask on one and the same core for every run, and see if that impacts the measured runtime differences between first and subsequent runs.

By the way - how do you define 'first run'? Could it be that you've just compiled the executable? In that case (and I'm just guessing again here), some process (either the OS, a virus-scanner, or even some root-kit) might be busy analyzing your executable's behaviour, which might be skipped once the executable has been analyzed before. You could try to prove that by changing some random unimportant byte of your executable between runs, and see if that impacts the runtime negatively again?

Please post a summary once you figured out the cause(s) - this might help others too. Cheers!