Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance issue C++ - searching through an array

I have two versions of searching through an int array for a specific value.

The first version is the straight forward one

int FindWithoutBlock(int * Arr, int ArrLen, int Val)
{
    for ( int i = 0; i < ArrLen; i++ )
        if ( Arr[i] == Val )
          return i;
 return ArrLen;
}

The second version should be faster. The passed array needs to be one element larger than in the previous case. Say for an array with 5 values, you allocate six ints and then do the following

int FindWithBlock(int * Arr, int LastCellIndex, int Val)
{
    Arr[LastCellIndex]  = Val;

    int i;
    for ( i = 0 ; Arr[i] != Val; i++ );
    return i;
}

this version should be faster - you don't need to check array boundaries with each iteration through Arr.

Now the "issue". When running these functions 100K times on an array of 100K elements in Debug, the second version is roughly 2x faster. In Release however, the first version is approximately 6000x faster. And the question is why.

A program that demonstrates this is to be found at http://eubie.sweb.cz/main.cpp

Any insight is much appreciated. Daniel

like image 355
Daniel Bencik Avatar asked May 25 '12 10:05

Daniel Bencik


People also ask

Is a list faster than an array?

An array is faster than a list in python since all the elements stored in an array are homogeneous i.e., they have the same data type whereas a list contains heterogeneous elements. Moreover, Python arrays are implemented in C which makes it a lot faster than lists that are built-in in Python itself.

Is array faster than list C#?

Short answer: In . NET List<T> and Array<T> have the same speed/performance because in . NET List is wrapper around Array .

Are lists less efficient than arrays?

Better use of Memory: From a memory allocation point of view, linked lists are more efficient than arrays. Unlike arrays, the size for a linked list is not pre-defined, allowing the linked list to increase or decrease in size as the program runs.

Are arrays more efficient than lists?

Arrays can store data very compactly and are more efficient for storing large amounts of data. Arrays are great for numerical operations; lists cannot directly handle math operations. For example, you can divide each element of an array by the same number with just one line of code.


1 Answers

Here are my results using DevStudio 2005:

Debug:

  • Without block: 25.109
  • With block: 19.703

Release:

  • Without block: 0
  • With block: 6.046

It is very important to run this from the command line and not from within DevStudio, DevStudio does something to affect the performance of the app.

The only way to know what's really happening is to look at the assembler code. Here's the assembler generated in release:-

FindWithoutBlock:
00401000  xor         eax,eax 
00401002  cmp         dword ptr [ecx+eax*4],0F4240h 
00401009  je          FindWithoutBlock+1Ah (40101Ah) 
0040100B  add         eax,1 
0040100E  cmp         eax,186A0h 
00401013  jl          FindWithoutBlock+2 (401002h) 
00401015  mov         eax,186A0h 
0040101A  ret              

Note that the compiler has removed the ArrLen parameter and replaced it with a constant! It has also kept it as a function.

Here's what the compiler did with the other function (FindWithBlock):-

004010E0  mov         dword ptr [esp+38h],186A0h 
004010E8  mov         ebx,0F4240h 
004010ED  mov         dword ptr [esi+61A80h],ebx 
004010F3  xor         eax,eax 
004010F5  cmp         dword ptr [esi],ebx 
004010F7  je          main+0EFh (40110Fh) 
004010F9  lea         esp,[esp] 
00401100  add         eax,1 
00401103  cmp         dword ptr [esi+eax*4],ebx 
00401106  jne         main+0E0h (401100h) 
00401108  cmp         eax,186A0h 
0040110D  je          main+0F5h (401115h) 
0040110F  call        dword ptr [__imp__getchar (4020D0h)] 
00401115  sub         dword ptr [esp+38h],1 
0040111A  jne         main+0CDh (4010EDh) 

Here, the function has been in-lined. The lea esp,[esp] is just a 7 byte nop to align the next instruction. The code checks index 0 separately to all the other indices but the main loop is definately tighter than the FindWithoutBlock version.

Hmmm. Here's the code that calls FindWithoutBlock:-

0040106F  mov         ecx,edi 
00401071  mov         ebx,eax 
00401073  call        FindWithoutBlock (401000h) 
00401078  mov         ebp,eax 
0040107A  mov         edi,186A0h 
0040107F  cmp         ebp,186A0h 
00401085  je          main+6Dh (40108Dh) 
00401087  call        dword ptr [__imp__getchar (4020D0h)] 
0040108D  sub         edi,1 
00401090  jne         main+5Fh (40107Fh) 

Aha! The FindWitoutBlock function is only being called once! The compiler has spotted that the function will return the same value every time and has optimised it to a single call. In the FindWithBlock, the compiler can't make the same assumption because you write to the array before the search, thus the array is (potentially) different for each call.

To test this, add the volatile keyword like this:-

int FindWithoutBlock(volatile int * Arr, int ArrLen, int Val)
{
    for ( int i = 0; i < ArrLen; i++ )
        if ( Arr[i] == Val )
            return i;

    return ArrLen;
}

int FindWithBlock(volatile int * Arr, int LastCellIndex, int Val)
{
    Arr[LastCellIndex]  = Val;

    int i;
    for ( i = 0 ; Arr[i] != Val; i++ );

    return i;
}

Doing this, both versions run in similar time (6.040). Seeing as the memory access is a major bottleneck, the more complex tests of the FindWithoutBlock don't impact on the overall speed.

like image 52
Skizz Avatar answered Oct 17 '22 02:10

Skizz