Extremely bizarre code generation in Visual C++, for nearly identical code; 3x speed difference

Question

The code below (reduced from my larger code, after my astonishment at how its speed paled in comparison with that of std::vector) has two peculiar features:

It runs more than three times faster when I make a very tiny modification to the source code (always compiling it with /O2 with Visual C++ 2010).

Note: To make this a little more fun, I put a hint for the modification at the end, so you can spend some time figuring out the change yourself. The original code was ~500 lines, so it took me a heck of a lot longer to pin it down, since the fix looks pretty irrelevant to the performance.
It runs about 20% faster with /MTd than with /MT, even though the output loop looks the same!!!

The difference in the assembly code for the tiny-modification case is:

Loop without the modification (~300 ms):

00403383  mov         esi,dword ptr [esp+10h] 
00403387  mov         edx,dword ptr [esp+0Ch] 
0040338B  mov         dword ptr [edx+esi*4],eax 
0040338E  add         dword ptr [esp+10h],ecx 
00403392  add         eax,ecx 
00403394  cmp         eax,4000000h 
00403399  jl          main+43h (403383h)

Loop with /MTd (looks identical! but ~270 ms):

00407D73  mov         esi,dword ptr [esp+10h] 
00407D77  mov         edx,dword ptr [esp+0Ch] 
00407D7B  mov         dword ptr [edx+esi*4],eax 
00407D7E  add         dword ptr [esp+10h],ecx 
00407D82  add         eax,ecx 
00407D84  cmp         eax,4000000h 
00407D89  jl          main+43h (407D73h)

Loop with the modification (~100 ms!!):

00403361  mov         dword ptr [esi+eax*4],eax 
00403364  inc         eax  
00403365  cmp         eax,4000000h 
0040336A  jl          main+21h (403361h)

Now my question is, why should the changes above have the effects they do? It's completely bizarre!

Especially the first one -- it shouldn't affect anything at all (once you see the difference in the code), and yet it lowers the speed dramatically.

Is there an explanation for this?

#include <cstdio>
#include <ctime>
#include <algorithm>
#include <memory>
template<class T, class Allocator = std::allocator<T> >
struct vector : Allocator
{
    T *p;
    size_t n;
    struct scoped
    {
        T *p_;
        size_t n_;
        Allocator &a_;
        ~scoped() { if (p_) { a_.deallocate(p_, n_); } }
        scoped(Allocator &a, size_t n) : a_(a), n_(n), p_(a.allocate(n, 0)) { }
        void swap(T *&p, size_t &n)
        {
            std::swap(p_, p);
            std::swap(n_, n);
        }
    };
    vector(size_t n) : n(0), p(0) { scoped(*this, n).swap(p, n); }
    void push_back(T const &value) { p[n++] = value; }
};
int main()
{
    int const COUNT = 1 << 26;
    vector<int> vect(COUNT);
    clock_t start = clock();
    for (int i = 0; i < COUNT; i++) { vect.push_back(i); }
    printf("time: %d
", (clock() - start) * 1000 / CLOCKS_PER_SEC);
}

Hint (hover your mouse below):

It has to do with the allocator.

Answer:

Change Allocator &a_ to Allocator a_.

Michael Burr · Accepted Answer

For what it's worth, my speculation for the difference between /MT and /MTd is that the /MTd heap allocation will paint the heap memory for debugging purposes making it more likely to be paged in - that occurs before you start the clock.

If you 'pre-heat' the vector allocation, you get the same numbers for /MT and /MTd:

vector<int> vect(COUNT);

// make sure vect's memory is warmed up
for (int i = 0; i < COUNT; i++) { vect.push_back(i); }
vect.n = 0; // clear the vector

clock_t start = clock();
for (int i = 0; i < COUNT; i++) { vect.push_back(i); }
printf("time: %d
", (clock() - start) * 1000 / CLOCKS_PER_SEC);

lenx.wei · Answer

It's strange that Allocator& will break the alias chain while Allocator will not.

You can try

for(int i=vect.n; i<COUNT;++i){
    ...
}

to enforce i and n are synchronized. This will make vc much easier to optimize.

Extremely bizarre code generation in Visual C++, for nearly identical code; 3x speed difference

Tags:

c++

performance

code-generation

visual-c++

Hint (hover your mouse below):

Answer:

user541686

2 Answers

Michael Burr

lenx.wei

Recent Activity

Donate For Us

Extremely bizarre code generation in Visual C++, for nearly identical code; 3x speed difference

Tags:

c++

performance

code-generation

visual-c++

Hint (hover your mouse below):

Answer:

user541686

2 Answers

Michael Burr

lenx.wei

Related questions

Recent Activity

Donate For Us