Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading an STL set with pre-sorted data, C++

Tags:

c++

set

stl

sorted

I'm working with C++ in Visual Studio 2010. I have an STL set, which I'm saving to file when my program shuts down. The next time the program starts up, I load the (sorted) data back into a set. I'm trying to optimize the loading process, and I'm having trouble. I suspect the problem is with frequent re-balancing, and I'm looking for a way to avoid that.

First, I did it with no optimization, using "set->insert (const value_type& x )"

Time: ~5.5 minutes

Then I tried using the version of insert() where you pass in a hint for the location of the insert():

iterator insert ( iterator position, const value_type& x );

Roughly, I did this:

set<int> My_Set;
set<int>::iterator It;
It = My_Set.insert (0);
for (int I=1; I<1000; I++) {
   It = My_Set.insert (It, I);  //Remember the previous insertion's iterator
   }

Time: ~5.4 minutes

Barely any improvement! I don't think the problem is with overhead in reading from file--commenting out the insert() reduces the time to 2 seconds. I don't think the problem is with overhead in copying my object--it's a Plain Old Data object with an int and a char.

The only thing I can think of is that the set is constantly re-balancing.

1.) Do you agree with my guess?

2.) Is there a way to "pause" the rebalancing while I load the set, and then rebalance once at the end? (Or... Would that even help?)

3.) Is there a smarter way to load the sorted data, i.e. not simply moving from lowest to highest? Perhaps alternating my insertions so that it doesn't have to balance often? (Example: Insert 1, 1000, 2, 999, 3, 998,...)

like image 959
Jugulum Avatar asked Mar 23 '11 20:03

Jugulum


People also ask

Is set in STL sorted?

Sets are a type of associative container in which each element has to be unique because the value of the element identifies it. The values are stored in a specific sorted order i.e. either ascending or descending.

Can you sort std::set?

std::set is an associative container that contains a sorted set of unique objects of type Key . Sorting is done using the key comparison function Compare.

Does set insert elements in sorted order?

Set allows to traverse elements in sorted order whereas Unordered_set doesn't allow to traverse elements in sorted order.

How do you access the elements of a set?

You cannot access items in a set by referring to an index or a key. But you can loop through the set items using a for loop, or ask if a specified value is present in a set, by using the in keyword.


1 Answers

About how many elements we are talking?

I made an short test with 10.000.000 integers (prepared in an vector) and inserted them in three different ways into the set.

Prepare Input:

  std::vector<int> input;
  for(int i = 0; i < 10*1000*1000; ++i) {
     input.push_back(i);
  }


Insert into set item by item with insert:

Release: 2,4 seconds / Debug: 110,8 seconds

  std::set<int> mySet;
  std::for_each(input.cbegin(), input.cend(), [&mySet] (int value) {
     mySet.insert(value);
  });


Insert into set with insert(itBegin, itEnd):

Release: 0,9 seconds / Debug: 47,5 seconds

  std::set<int> mySet;
  mySet.insert(input.cbegin(), input.cend());

  // this is also possible - same execution time:
  std::set<int> mySet(input.cbegin(), input.cend());

So insertion could be heavily speeded up, but even the slow way should be far from several minutes.


EDIT:

I made an test with debug-mode meanwhile - wow - I know debug costs performance, but it is more than I thought. With 50.000.000 elements there is an bad alloc in debug-mode, so I updated my post to 10.000.000 elements and showed times for release and debug build.

You could see the immense differences here - 50 times with the faster solution.

Additionally the fast solution (insert(itBegin, itEnd)) seems to be linear to the amount of elements (with presorted data!). The previus test had five times more elements and the insert-time was reduced from 4,6 to 0,9 - about five times.

like image 114
MacGucky Avatar answered Sep 19 '22 15:09

MacGucky