I'm working with C++ in Visual Studio 2010. I have an STL set, which I'm saving to file when my program shuts down. The next time the program starts up, I load the (sorted) data back into a set. I'm trying to optimize the loading process, and I'm having trouble. I suspect the problem is with frequent re-balancing, and I'm looking for a way to avoid that.
First, I did it with no optimization, using "set->insert (const value_type& x )"
Time: ~5.5 minutes
Then I tried using the version of insert() where you pass in a hint for the location of the insert():
iterator insert ( iterator position, const value_type& x );
Roughly, I did this:
set<int> My_Set;
set<int>::iterator It;
It = My_Set.insert (0);
for (int I=1; I<1000; I++) {
It = My_Set.insert (It, I); //Remember the previous insertion's iterator
}
Time: ~5.4 minutes
Barely any improvement! I don't think the problem is with overhead in reading from file--commenting out the insert() reduces the time to 2 seconds. I don't think the problem is with overhead in copying my object--it's a Plain Old Data object with an int and a char.
The only thing I can think of is that the set is constantly re-balancing.
1.) Do you agree with my guess?
2.) Is there a way to "pause" the rebalancing while I load the set, and then rebalance once at the end? (Or... Would that even help?)
3.) Is there a smarter way to load the sorted data, i.e. not simply moving from lowest to highest? Perhaps alternating my insertions so that it doesn't have to balance often? (Example: Insert 1, 1000, 2, 999, 3, 998,...)
Sets are a type of associative container in which each element has to be unique because the value of the element identifies it. The values are stored in a specific sorted order i.e. either ascending or descending.
std::set is an associative container that contains a sorted set of unique objects of type Key . Sorting is done using the key comparison function Compare.
Set allows to traverse elements in sorted order whereas Unordered_set doesn't allow to traverse elements in sorted order.
You cannot access items in a set by referring to an index or a key. But you can loop through the set items using a for loop, or ask if a specified value is present in a set, by using the in keyword.
About how many elements we are talking?
I made an short test with 10.000.000 integers (prepared in an vector) and inserted them in three different ways into the set.
Prepare Input:
std::vector<int> input;
for(int i = 0; i < 10*1000*1000; ++i) {
input.push_back(i);
}
Release: 2,4 seconds / Debug: 110,8 seconds
std::set<int> mySet;
std::for_each(input.cbegin(), input.cend(), [&mySet] (int value) {
mySet.insert(value);
});
insert(itBegin, itEnd)
:
Release: 0,9 seconds / Debug: 47,5 seconds
std::set<int> mySet;
mySet.insert(input.cbegin(), input.cend());
// this is also possible - same execution time:
std::set<int> mySet(input.cbegin(), input.cend());
So insertion could be heavily speeded up, but even the slow way should be far from several minutes.
I made an test with debug-mode meanwhile - wow - I know debug costs performance, but it is more than I thought. With 50.000.000 elements there is an bad alloc in debug-mode, so I updated my post to 10.000.000 elements and showed times for release and debug build.
You could see the immense differences here - 50 times with the faster solution.
Additionally the fast solution (insert(itBegin, itEnd)
) seems to be linear to the amount of elements (with presorted data!). The previus test had five times more elements and the insert-time was reduced from 4,6 to 0,9 - about five times.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With