Performance issue C++ - searching through an array

Tags:

performance

I have two versions of searching through an int array for a specific value.

The first version is the straight forward one

int FindWithoutBlock(int * Arr, int ArrLen, int Val)
{
    for ( int i = 0; i < ArrLen; i++ )
        if ( Arr[i] == Val )
          return i;
 return ArrLen;
}

The second version should be faster. The passed array needs to be one element larger than in the previous case. Say for an array with 5 values, you allocate six ints and then do the following

int FindWithBlock(int * Arr, int LastCellIndex, int Val)
{
    Arr[LastCellIndex]  = Val;

    int i;
    for ( i = 0 ; Arr[i] != Val; i++ );
    return i;
}

this version should be faster - you don't need to check array boundaries with each iteration through Arr.

Now the "issue". When running these functions 100K times on an array of 100K elements in Debug, the second version is roughly 2x faster. In Release however, the first version is approximately 6000x faster. And the question is why.

A program that demonstrates this is to be found at http://eubie.sweb.cz/main.cpp

Any insight is much appreciated. Daniel

355

asked May 25 '12 10:05

Daniel Bencik

1 Answers

Here are my results using DevStudio 2005:

Debug:

Without block: 25.109
With block: 19.703

Release:

Without block: 0
With block: 6.046

It is very important to run this from the command line and not from within DevStudio, DevStudio does something to affect the performance of the app.

The only way to know what's really happening is to look at the assembler code. Here's the assembler generated in release:-

FindWithoutBlock:
00401000  xor         eax,eax 
00401002  cmp         dword ptr [ecx+eax*4],0F4240h 
00401009  je          FindWithoutBlock+1Ah (40101Ah) 
0040100B  add         eax,1 
0040100E  cmp         eax,186A0h 
00401013  jl          FindWithoutBlock+2 (401002h) 
00401015  mov         eax,186A0h 
0040101A  ret

Note that the compiler has removed the ArrLen parameter and replaced it with a constant! It has also kept it as a function.

Here's what the compiler did with the other function (FindWithBlock):-

004010E0  mov         dword ptr [esp+38h],186A0h 
004010E8  mov         ebx,0F4240h 
004010ED  mov         dword ptr [esi+61A80h],ebx 
004010F3  xor         eax,eax 
004010F5  cmp         dword ptr [esi],ebx 
004010F7  je          main+0EFh (40110Fh) 
004010F9  lea         esp,[esp] 
00401100  add         eax,1 
00401103  cmp         dword ptr [esi+eax*4],ebx 
00401106  jne         main+0E0h (401100h) 
00401108  cmp         eax,186A0h 
0040110D  je          main+0F5h (401115h) 
0040110F  call        dword ptr [__imp__getchar (4020D0h)] 
00401115  sub         dword ptr [esp+38h],1 
0040111A  jne         main+0CDh (4010EDh)

Here, the function has been in-lined. The lea esp,[esp] is just a 7 byte nop to align the next instruction. The code checks index 0 separately to all the other indices but the main loop is definately tighter than the FindWithoutBlock version.

Hmmm. Here's the code that calls FindWithoutBlock:-

0040106F  mov         ecx,edi 
00401071  mov         ebx,eax 
00401073  call        FindWithoutBlock (401000h) 
00401078  mov         ebp,eax 
0040107A  mov         edi,186A0h 
0040107F  cmp         ebp,186A0h 
00401085  je          main+6Dh (40108Dh) 
00401087  call        dword ptr [__imp__getchar (4020D0h)] 
0040108D  sub         edi,1 
00401090  jne         main+5Fh (40107Fh)

Aha! The FindWitoutBlock function is only being called once! The compiler has spotted that the function will return the same value every time and has optimised it to a single call. In the FindWithBlock, the compiler can't make the same assumption because you write to the array before the search, thus the array is (potentially) different for each call.

To test this, add the volatile keyword like this:-

int FindWithoutBlock(volatile int * Arr, int ArrLen, int Val)
{
    for ( int i = 0; i < ArrLen; i++ )
        if ( Arr[i] == Val )
            return i;

    return ArrLen;
}

int FindWithBlock(volatile int * Arr, int LastCellIndex, int Val)
{
    Arr[LastCellIndex]  = Val;

    int i;
    for ( i = 0 ; Arr[i] != Val; i++ );

    return i;
}

Doing this, both versions run in similar time (6.040). Seeing as the memory access is a major bottleneck, the more complex tests of the FindWithoutBlock don't impact on the overall speed.

answered Oct 17 '22 02:10

Skizz

Related questions
                            
                                Efficiency of vector index access vs iterator access
                            
                                How to read a specific amount of characters from a text file
                            
                                How to sleep with boost::chrono?
                            
                                why stl choose tree based map instead of hash based map?
                            
                                How to do TCP connection pooling in C/C++
                            
                                Call a Non Static Member Method from Another Method
                            
                                Is second part of map<..,..> stable?
                            
                                Using Eigen Library with OpenCV 2.3.1
                            
                                static class member of class's own type [duplicate]
                            
                                Pass an array as template type
                            
                                Wrong conversion from double to QString in Qt on ARM
                            
                                Non-linear programming library in C++
                            
                                Understanding buffering behavior of fwrite()
                            
                                What is the unsigned counterpart of ptrdiff_t?
                            
                                WaitForSingleObject - do threads waiting form a queue?
                            
                                c++11 optimal parameter passing
                            
                                what is the type of initialize list in C++ array?
                            
                                Make doesn't rebuild headers when changed
                            
                                C++ : constructor initialization list for an array?
                            
                                How to avoid downcast?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With