I have long time interested in this newbie question.
For example we have two situations:
I have one any loop with two functions. (While
or For
does not matter).
for (int i = 0; i < 1000; i++)
{
Function_1();
Function_2();
}
I have two loops with one action each.
for (int i = 0; i < 1000; i++)
{
Function_1();
}
for (int i = 0; i < 1000; i++)
{
Function_2();
}
I understand that first will work faster.
But what difference in performance speed between this two situation? (in percentage)
How much performance decrease if max loop count will increase?
And what (processor or RAM) takes on a greater load in this situations?
From a purely theoretical viewpoint, there's no difference between the two. Either way, it's O(N), and that's the end of that.
From a more practical viewpoint, caching can change that simple answer quite a bit. One may use the cache considerably more effectively than the other. That won't necessarily be the first one you've shown though.
On a real (modern) computer, it basically works out as a question of which makes more effective use of the cache.
That, in turn, depends on how much of what kind of memory is used by each of Function_1
and Function_2
. If Function_1 and Function_2 each involve executing quite a lot of code, so each of them will fit in the L1 instruction cache, but both of them together won't, then the second version may well be faster. In this circumstance, the first version (alternating between the two functions) will have to load each function from main memory each time it executes, so you load the code from main memory ~2000 times. With the second one, you load the code for Function_1 from memory once, execute it 1000 times from the cache, then do the same with Function_2. A total of 2 loads from main memory.
In the other direction, let's assume that the code for both Function_1 and Function_2 can all fit in the instruction cache, but both Function_1 and Function_2 operate on the same data, and the total of that data is too large to fit in the data cache.
This will typically reverse the situation: executing Function_1 on a block of data, then executing Function_2 on the same block of data will only load that data from memory once, then do all the necessary computation on it, then load the next block of data, and so on. Each block of data is loaded from main memory only once.
In this case, the second version of your code may be slower by a factor of about 2--it'll load a block of memory and execute Function_1 on it. Then it'll load a second block of memory, and execute Function_1 on that, and so on. Once all the memory has been processed with Function_1, it'll go back through and load all the same blocks of memory to execute Function_2 on them.
Of course, in a modern processor, it's not that simple either. We now frequently have 2 or 3 levels of cache, often with different eviction policies. The first level is typically split into instruction cache and data cache, but you usually also have at least one level that's unified. This doesn't eliminate the factors outlined above, but can cloud the picture considerably.
There's an entire area of research into things like cache aware and cache oblivious algorithms to help make intelligent choices for cases like yours. The cache aware sorts are based on (more detailed versions of) models like above, choosing how to organize calculations to fit cache organization. Cache oblivious algorithms aim more toward a relatively generic model of caching, and providing good performance almost regardless of exactly how a specific cache is organized.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With