Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory access comparison

Tags:

c++

arrays

memory

Which one of the 2 is faster (C++)?

for(i=0; i<n; i++)
{
    sum_a = sum_a + a[i];
    sum_b = sum_b + b[i];
}

Or

for(i=0; i<n; i++)
{
    sum_a = sum_a + a[i];
}
for(i=0; i<n; i++)
{
    sum_b = sum_b + b[i];
}

I am a beginner so I don't know whether this makes sense, but in the first version, array 'a' is accessed, then 'b', which might lead to many memory switches, since arrays 'a' and 'b' are at different memory locations. But in the second version, whole of array 'a' is accessed first, and then whole of array 'b', which means continuous memory locations are accessed instead of alternating between the two arrays.

Does this make any difference between the execution time of the two versions (even a very negligible one)?

like image 909
Utkarsh Avatar asked Jul 12 '16 11:07

Utkarsh


2 Answers

I don't think there is correct answer to this question. In general, second version has more twice as much iterations (CPU execution overhead), but worse access to memory (Memory access overhead). Now imagine you run this code on PC that has slow clock, but insanely good cache. Memory overhead gets reduced, but since clock is slow running same loop twice makes execution much longer. Other way around: fast clock, but bad memory - running two loops is not a problem, so it's better to optimize for memory access.

Here is cool example on how you can profile your app: Link

like image 156
MaciekGrynda Avatar answered Sep 27 '22 21:09

MaciekGrynda


Which one of the 2 is faster (C++)?

Either. It depends on

  • The implementation of operator+ and operator[] (in case they are overloaded)
  • Location of the arrays in memory (adjacent or not)
  • Size of the arrays
  • Size of the cpu caches
  • Associativity of caches
  • Cache speed in relation to memory speed
  • Possibly other factors

As Revolver_Ocelot mentionend in their observation in a comment, some compilers may even transform the written loop into the other form.

Does this make any difference between the execution time of the two versions (even a very negligible one)?

It can make a difference. The difference may be significant or negligible.

Your analysis is sound. Memory access is typically much slower than cache, and jumping between two memory locations may cause cache thrashing in some situations. I would recommend using the separated approach by default, and only combine the loops if you have measured it to be faster on your target CPU.

As MSalters points out thrashing shouldn't be a problem modern desktop processors (modern as in ~x86).

like image 38
eerorika Avatar answered Sep 27 '22 23:09

eerorika