Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ Why std::multimap is slower than std::priority_queue

I implemented an algorithm where I make use of an priority queue. I was motivated by this question: Transform a std::multimap into std::priority_queue

I am going to store up to 10 million elements with their specific priority value.

I then want to iterate until the queue is empty. Every time an element is retrieved it is also deleted from the queue.

After this I recalculate the elements pririty value, because of previous iterations it can change.

If the value did increase I am inserting the element againg into the queue. This happens more often dependent on the progress. (at the first 25% it does not happen, in the next 50% it does happen, in the last 25% it will happen multiple times).

After receiving the next element and not reinserting it, I am going to process it. This for I do not need the priority value of this element but the technical ID of this element.

This was the reason I intuitively had chosen a std::multimap to achieve this, using .begin() to get the first element, .insert() to insert it and .erase() to remove it. Also, I did not intuitively choose std::priority_queue directly because of other questions to this topic answering that std::priority_queue most likely is used for only single values and not for mapped values.

After reading the link above I reimplemented it using priority queue analogs to the other question from the link. My runtimes seem to be not that unequal (about an hour on 10 mio elements). Now I am wondering why std::priority_queue is faster at all.

I actually would expect to be the std::multimap faster because of the many reinsertions. Maybe the problem is that there are too many reorganizations of the multimap?

like image 617
Kaspatoo Avatar asked Jan 23 '17 13:01

Kaspatoo


People also ask

Which is faster priority queue or multiset?

Hmhmh, as you see multiset is just the same performance as multimap and priority_queue is the most fastest (around 43% faster).

What is the primary difference between Multimap and map?

The map and the multimap are both containers that manage key/value pairs as single components. The essential difference between the two is that in a map the keys must be unique, while a multimap permits duplicate keys.

Which is faster priority queue or set?

PS - priority queue is 30 times faster than set.

Is Multimap sorted in C++?

Multimap is an associative container that contains a sorted list of key-value pairs, while permitting multiple entries with the same key. Sorting is done according to the comparison function Compare , applied to the keys. Search, insertion, and removal operations have logarithmic complexity.


2 Answers

To summarize: your runtime profile involves both removing and inserting elements from your abstract priority queue, with you trying to use both a std::priority_queue and a std::multimap as the actual implementation.

Both the insertion into a priority queue and into a multimap have roughly equivalent complexity: logarithmic.

However, there's a big difference with removing the next element from a multimap versus a priority queue. With a priority queue this is going to be a constant-complexity operation. The underlying container is a vector, and you're removing the last element from the vector, which is going to be mostly a nothing-burger.

But with a multimap you're removing the element from one of the extreme ends of the multimap.

The typical underlying implementation of a multimap is a balanced red/black tree. Repeated element removals from one of the extreme ends of a multimap has a good chance of skewing the tree, requiring frequent rebalancing of the entire tree. This is going to be an expensive operation.

This is likely to be the reason why you're seeing a noticeable performance difference.

like image 147
Sam Varshavchik Avatar answered Oct 12 '22 06:10

Sam Varshavchik


I think the main difference comes form two facts:

  1. Priority queue has a weaker constraint on the order of elements. It doesn't have to have sorted whole range of keys/priorities. Multimap, has to provide that. Priority queue only have to guarantee the 1st / top element to be largest.

So, while, the theoretical time complexities for the operations on both are the same O(log(size)), I would argue that erase from multimap, and rebalancing the RB-tree performs more operations, it simply has to move around more elements. (NOTE: RB-tree is not mandatory, but very often chosen as underlying container for multimap)

  1. The underlying container of priority queue is contiguous in memory (it's a vector by default).

I suspect the rebalancing is also slower, because RB-tree relies on nodes (vs contiguous memory of vector), which makes it prone to cache misses, although one has to remember that operations on heap are not done in iterative manner, it is hopping through the vector. I guess to be really sure one would have to profile it.

The above points are true for both insertions and erasues. I would say the difference is in the constant factors lost in the big-O notation. This is intuitive thinking.

like image 38
luk32 Avatar answered Oct 12 '22 07:10

luk32