Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to cope with slow vector<string> destructor?

Is there a well-known solution to the following problem?

  • You have a vector of a lot of strings

  • You happily populate it with a few hundred thousand strings, and it's fast

  • You manipulate your strings in arbitrary ways; life is good.

  • You're done with the vector; vector goes out of scope, and now you have to go grab some coffee and sit back while each string gets destroyed one-by-one.

Edit: Problem solved!

I just ran the code below on Linux on the same computer and it was fine, which led me to figure out the solution. It turned out to be with my system -- something I'd caused myself, a long time ago, but which I'd forgotten.
Upon fixing the problem, the time decreased dramatically, to even better than with GCC!

It's a good puzzle though, so instead of posting the answer, I'll do something else:
I'm not allowed to place a bounty on this question right now, but if you think you know the cause, give it a shot. If it's correct, I'll accept it and give you a nice bounty. (Remind me if I forget to give you the bounty!)
If no one knows then I'll just post it myself after some time passes.

Sample code:

I used to be as skeptical as anybody, but now I guess people do have a point when they the STL is slow!
This took 3.95 seconds on my laptop: (the shuffling is critical)

#include <cmath>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <string>
#include <vector>

int main()
{
    using namespace std;
    srand((unsigned)time(NULL));
    clock_t start;
    {
        vector<string> v(400000);
        for (size_t i = 0; i < v.size(); i++)
        {
            v[i].resize(16 + rand() % 32);  // some variation
        }

        // shuffle
        for (size_t i = 0; i < (size_t)pow((double)v.size(), 1.15); i++)
        {
            size_t j = rand() * (RAND_MAX + 1) + rand();
            swap(v[i % v.size()], v[j % v.size()]);
        }
        
        printf("Going out of scope...\n"); fflush(stdout);
        start = clock();
    }
    clock_t end = clock();
    printf("%u ms\n", (end - start) * 1000 / CLOCKS_PER_SEC);
    return 0;
}

It looks to me like this program is using some O(n2) algorithm internally, either in Visual C++ or in Windows. Not sure what's happening, but it's interesting...

like image 438
user541686 Avatar asked Jul 18 '12 01:07

user541686


People also ask

Is clearing vector C++ necessary?

The destructors of the elements are called and the used storage is deallocated. Note, that if the elements are pointers, the pointed-to objects are not destroyed. so no you don't have to call clear.

Is vector slower than array C++?

There is a myth that for run-time speed, one should use arrays. A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program.

Does vector clear call destructor?

std::vector<T>::clear() always calls the destructor of each element, but the destructor of a pointer is a no-op (or a pointer has a trivial destructor).

Are C++ vectors better than arrays?

Vector occupies more memory. Array is memory efficient data structure. Vector takes more time in accessing elements. Array access elements in constant time irrespective of their location as elements are arranged in a contiguous memory allocation.


1 Answers

Use custom allocator with deallocation en masse.

like image 73
Serj-Tm Avatar answered Oct 27 '22 00:10

Serj-Tm