Memory access comparison

Question

Which one of the 2 is faster (C++)?

for(i=0; i<n; i++)
{
    sum_a = sum_a + a[i];
    sum_b = sum_b + b[i];
}

Or

for(i=0; i<n; i++)
{
    sum_a = sum_a + a[i];
}
for(i=0; i<n; i++)
{
    sum_b = sum_b + b[i];
}

I am a beginner so I don't know whether this makes sense, but in the first version, array 'a' is accessed, then 'b', which might lead to many memory switches, since arrays 'a' and 'b' are at different memory locations. But in the second version, whole of array 'a' is accessed first, and then whole of array 'b', which means continuous memory locations are accessed instead of alternating between the two arrays.

Does this make any difference between the execution time of the two versions (even a very negligible one)?

MaciekGrynda · Accepted Answer

I don't think there is correct answer to this question. In general, second version has more twice as much iterations (CPU execution overhead), but worse access to memory (Memory access overhead). Now imagine you run this code on PC that has slow clock, but insanely good cache. Memory overhead gets reduced, but since clock is slow running same loop twice makes execution much longer. Other way around: fast clock, but bad memory - running two loops is not a problem, so it's better to optimize for memory access.

Here is cool example on how you can profile your app: Link

eerorika · Answer

Which one of the 2 is faster (C++)?

Either. It depends on

The implementation of operator+ and operator[] (in case they are overloaded)
Location of the arrays in memory (adjacent or not)
Size of the arrays
Size of the cpu caches
Associativity of caches
Cache speed in relation to memory speed
Possibly other factors

As Revolver_Ocelot mentionend in their observation in a comment, some compilers may even transform the written loop into the other form.

Does this make any difference between the execution time of the two versions (even a very negligible one)?

It can make a difference. The difference may be significant or negligible.

Your analysis is sound. Memory access is typically much slower than cache, and jumping between two memory locations may cause cache thrashing ^† in some situations. I would recommend using the separated approach by default, and only combine the loops if you have measured it to be faster on your target CPU.

^† As MSalters points out thrashing shouldn't be a problem modern desktop processors (modern as in ~x86).

Memory access comparison

Tags:

c++

arrays

memory

Utkarsh

2 Answers

MaciekGrynda

eerorika

Recent Activity

Donate For Us

Memory access comparison

Tags:

c++

arrays

memory

Utkarsh

2 Answers

MaciekGrynda

eerorika

Related questions

Recent Activity

Donate For Us