Is there a well-known solution to the following problem?
You have a vector of a lot of strings
You happily populate it with a few hundred thousand strings, and it's fast
You manipulate your strings in arbitrary ways; life is good.
You're done with the vector; vector goes out of scope, and now you have to go grab some coffee and sit back while each string gets destroyed one-by-one.
I just ran the code below on Linux on the same computer and it was fine, which led me to figure out the solution. It turned out to be with my system -- something I'd caused myself, a long time ago, but which I'd forgotten.
Upon fixing the problem, the time decreased dramatically, to even better than with GCC!
It's a good puzzle though, so instead of posting the answer, I'll do something else:
I'm not allowed to place a bounty on this question right now, but if you think you know the cause, give it a shot. If it's correct, I'll accept it and give you a nice bounty. (Remind me if I forget to give you the bounty!)
If no one knows then I'll just post it myself after some time passes.
I used to be as skeptical as anybody, but now I guess people do have a point when they the STL is slow!
This took 3.95 seconds on my laptop: (the shuffling is critical)
#include <cmath>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <string>
#include <vector>
int main()
{
using namespace std;
srand((unsigned)time(NULL));
clock_t start;
{
vector<string> v(400000);
for (size_t i = 0; i < v.size(); i++)
{
v[i].resize(16 + rand() % 32); // some variation
}
// shuffle
for (size_t i = 0; i < (size_t)pow((double)v.size(), 1.15); i++)
{
size_t j = rand() * (RAND_MAX + 1) + rand();
swap(v[i % v.size()], v[j % v.size()]);
}
printf("Going out of scope...\n"); fflush(stdout);
start = clock();
}
clock_t end = clock();
printf("%u ms\n", (end - start) * 1000 / CLOCKS_PER_SEC);
return 0;
}
It looks to me like this program is using some O(n2) algorithm internally, either in Visual C++ or in Windows. Not sure what's happening, but it's interesting...
The destructors of the elements are called and the used storage is deallocated. Note, that if the elements are pointers, the pointed-to objects are not destroyed. so no you don't have to call clear.
There is a myth that for run-time speed, one should use arrays. A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program.
std::vector<T>::clear() always calls the destructor of each element, but the destructor of a pointer is a no-op (or a pointer has a trivial destructor).
Vector occupies more memory. Array is memory efficient data structure. Vector takes more time in accessing elements. Array access elements in constant time irrespective of their location as elements are arranged in a contiguous memory allocation.
Use custom allocator with deallocation en masse.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With