Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Why is using a std::multiset as a priority queue faster than using a std::priority_queue?

I try to replace std::multiset with std::priority_queue. But I was dissapointed with the speed results. Running time of the algorithm increase by 50%...

Here are the corresponding commands:

top() = begin();
pop() = erase(knn.begin());
push() = insert();

I am surprised with the speed of priority_queue implementation, I expected different results (better for PQ)... Conceptually, the multiset is being used as a priority queue. Why are the priority queue and the multiset have such different performance, even with -O2?

Average of ten results, MSVS 2010, Win XP, 32 bit, method findAllKNN2 () (see bellow, please)

N           time [s]
100 000     0.5
1 000 000   8

N           time [s]
100 000     0.8
1 000 000   12

What could cause these results? No other changes of the source code have been made... Thanks for your help...

MS Implementation:

template <typename Point>
struct TKDNodePriority
    KDNode <Point> *node;
    typename Point::Type priority;

    TKDNodePriority() : node ( NULL ), priority ( 0 ) {}
    TKDNodePriority ( KDNode <Point> *node_, typename Point::Type priority_ ) : node ( node_ ), priority ( priority_ ) {}

    bool operator < ( const TKDNodePriority <Point> &n1 ) const
            return priority > n1.priority;

template <typename Point>
struct TNNeighboursList
    typedef std::multiset < TKDNodePriority <Point> > Type;


template <typename Point>
template <typename Point2>
void KDTree2D <Point>::findAllKNN2 ( const Point2 * point, typename TNNeighboursList <Point>::Type & knn, unsigned int k, KDNode <Point> *node, const unsigned int depth ) const
    if ( node == NULL )

    if ( point->getCoordinate ( depth % 2 ) <= node->getData()->getCoordinate ( depth % 2 ) )
    findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );

            findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );

typename Point::Type dist_q_node = ( node->getData()->getX() - point->getX() ) * ( node->getData()->getX() - point->getX() ) +
                             ( node->getData()->getY() - point->getY() ) * ( node->getData()->getY() - point->getY() );

if (knn.size() == k)
    if (dist_q_node < knn.begin()->priority )
        knn.insert ( TKDNodePriority <Point> ( node,  dist_q_node ) );

    knn.insert ( TKDNodePriority <Point> ( node,  dist_q_node ) );

typename Point::Type dist_q_node_straight = ( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) *
                                                ( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) ;

typename Point::Type top_priority =  knn.begin()->priority;
if ( knn.size() < k ||  dist_q_node_straight <  top_priority )
            if ( point->getCoordinate ( node->getDepth() % 2 ) < node->getData()->getCoordinate ( node->getDepth() % 2 ) )
        findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );

        findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );

PQ implementation (slower, why?)

template <typename Point>
struct TKDNodePriority
    KDNode <Point> *node;
    typename Point::Type priority;

    TKDNodePriority() : node ( NULL ), priority ( 0 ) {}
    TKDNodePriority ( KDNode <Point> *node_, typename Point::Type priority_ ) : node ( node_ ), priority ( priority_ ) {}

    bool operator < ( const TKDNodePriority <Point> &n1 ) const
            return priority > n1.priority;

template <typename Point>
struct TNNeighboursList
    typedef std::priority_queue< TKDNodePriority <Point> > Type;


template <typename Point>
template <typename Point2>
void KDTree2D <Point>::findAllKNN2 ( const Point2 * point, typename TNNeighboursList <Point>::Type & knn, unsigned int k, KDNode <Point> *node, const unsigned int depth ) const

    if ( node == NULL )

    if ( point->getCoordinate ( depth % 2 ) <= node->getData()->getCoordinate ( depth % 2 ) )
    findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );

            findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );

typename Point::Type dist_q_node = ( node->getData()->getX() - point->getX() ) * ( node->getData()->getX() - point->getX() ) +
                             ( node->getData()->getY() - point->getY() ) * ( node->getData()->getY() - point->getY() );

if (knn.size() == k)
    if (dist_q_node < knn.top().priority )

        knn.push ( TKDNodePriority <Point> ( node,  dist_q_node ) );

    knn.push ( TKDNodePriority <Point> ( node,  dist_q_node ) );

typename Point::Type dist_q_node_straight = ( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) *
                                                ( point->getCoordinate ( node->getDepth() % 2 ) - node->getData()->getCoordinate ( node->getDepth() % 2 ) ) ;

typename Point::Type top_priority =  knn.top().priority;
if ( knn.size() < k ||  dist_q_node_straight <  top_priority )
            if ( point->getCoordinate ( node->getDepth() % 2 ) < node->getData()->getCoordinate ( node->getDepth() % 2 ) )
        findAllKNN2 ( point, knn, k, node->getRight(), depth + 1 );

        findAllKNN2 ( point, knn, k, node->getLeft(), depth + 1 );
like image 406
Johny Avatar asked May 05 '11 09:05


People also ask

Which is faster priority queue or multiset?

Hmhmh, as you see multiset is just the same performance as multimap and priority_queue is the most fastest (around 43% faster).

Why priority queue is faster than set?

The priority queue only offers access to the largest element, while the set gives you a complete ordering of all elements. This weaker interface means that implementations may be more efficient (e.g. you can store the actual queue data in a vector , which may have better performance on account of its memory locality).

When we should use priority queue?

Operating Systems: Priority queues are used to select the next process to run, ensuring high-priority tasks run before low-priority ones. It is also applied for load balancing, and interrupt handling.

What is the use of priority queue in C++?

In C++, the STL priority_queue provides the functionality of a priority queue data structure. A priority queue is a special type of queue in which each element is associated with a priority value and elements are served based on their priority.

1 Answers

First of all, author didn't provide minimal example of code that leads to mentioned performance drop. Second, the question was asked 8 years ago, I'm sure compilers made a huge boost on performance.

I've made a benchmark example where I take 1st element in the queue then push in back with another priority (simulating push of new element without creating one), doing that by count of elements in array kNodesCount in a loop with kRunsCount iterations. I'm comparing priority_queue with multiset and multimap. I've decided include multimap for more precise comparsion. It's simple test is very close to author use case, also I've tried to reproduce structs he used in the code samples.

#include <set>
#include <type_traits>
#include <vector>
#include <chrono>
#include <queue>
#include <map>
#include <iostream>

template<typename T>
struct Point {
    static_assert(std::is_integral<T>::value || std::is_floating_point<T>::value, "Incompatible type");
    using Type = T;

    T x;
    T y;

template<typename T>
struct Node {
    using Type = T;

    Node<T> * left;
    Node<T> * right;
    T data;

template <typename T>
struct NodePriority {
    using Type = T;
    using DataType = typename T::Type;

    Node<T> * node = nullptr;
    DataType priority = static_cast<DataType>(0);

    bool operator < (const NodePriority<T> & n1) const noexcept {
        return priority > n1.priority;

    bool operator > (const NodePriority<T> & n1) const noexcept {
        return priority < n1.priority;

// descending order by default
template <typename T>
using PriorityQueueList = std::priority_queue<T>;

// greater used because of ascending order by default
template <typename T>
using MultisetList = std::multiset<T, std::greater<T>>;

// greater used because of ascending order by default
template <typename T>
using MultimapList = std::multimap<typename T::DataType, T, std::greater<typename T::DataType>>;

struct Inner {
    template<template <typename> class C, typename T>
    static void Operate(C<T> & list, std::size_t priority);

    template<typename T>
    static void Operate(PriorityQueueList<T> & list, std::size_t priority) {
        if (list.size() % 2 == 0) {
            auto el = std::move(list.top());
            el.priority = priority;
        else {

    template<typename T>
    static void Operate(MultisetList<T> & list, std::size_t priority) {
        if (list.size() % 2 == 0) {
            auto el = std::move(*list.begin());
            el.priority = priority;
        else {

    template<typename T>
    static void Operate(MultimapList<T> & list, std::size_t priority) {
        if (list.size() % 2 == 0) {
            auto el = std::move(*list.begin());
            auto & elFirst = const_cast<int&>(el.first);
            elFirst = priority;
            el.second.priority = priority;
        else {

template<typename T>
void doOperationOnPriorityList(T & list) {
    for (std::size_t pos = 0, len = list.size(); pos < len; ++pos) {
        // move top element and update priority
        auto priority = std::rand() % 10;
        Inner::Operate(list, priority);

template<typename T>
void measureOperationTime(T & list, std::size_t runsCount) {
    std::chrono::system_clock::time_point t1, t2;
    std::uint64_t totalTime(0);
    for (std::size_t i = 0; i < runsCount; ++i) {
        t1 = std::chrono::system_clock::now();
        t2 = std::chrono::system_clock::now();
        auto castedTime = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
        std::cout << "Run " << i << " time: " << castedTime << "\n";
        totalTime += castedTime;

    std::cout << "Average time is: " << totalTime / runsCount << " ms" << std::endl;

int main() {
    // consts
    const int kNodesCount = 10'000'000;
    const int kRunsCount = 10;

    // prepare data
    PriorityQueueList<NodePriority<Point<int>>> neighboursList1;
    MultisetList<NodePriority<Point<int>>> neighboursList2;
    MultimapList<NodePriority<Point<int>>> neighboursList3;
    std::vector<Node<Point<int>>> nodes;
    for (auto i = 0; i < kNodesCount; ++i) {
        nodes.emplace_back(decltype(nodes)::value_type{ nullptr, nullptr, { 0,0 } });
        auto priority = std::rand() % 10;
        neighboursList1.emplace(decltype(neighboursList1)::value_type{ &nodes.back(), priority });
        neighboursList2.emplace(decltype(neighboursList2)::value_type{ &nodes.back(), priority });
        neighboursList3.emplace(decltype(neighboursList3)::value_type{ priority, { &nodes.back(), priority } });

    // do operation on data
    std::cout << "\nPriority queue\n";
    measureOperationTime(neighboursList1, kRunsCount);
    std::cout << "\nMultiset\n";
    measureOperationTime(neighboursList2, kRunsCount);
    std::cout << "\nMultimap\n";
    measureOperationTime(neighboursList3, kRunsCount);

    return 0;

I've made release build with /Ox using VS v15.8.9. Take a look on results for 10'000'000 items in 10 runs:

Priority queue
Run 0 time: 764
Run 1 time: 933
Run 2 time: 920
Run 3 time: 813
Run 4 time: 991
Run 5 time: 862
Run 6 time: 902
Run 7 time: 1277
Run 8 time: 774
Run 9 time: 771
Average time is: 900 ms

Run 0 time: 2235
Run 1 time: 1811
Run 2 time: 1755
Run 3 time: 1535
Run 4 time: 1475
Run 5 time: 1388
Run 6 time: 1482
Run 7 time: 1431
Run 8 time: 1347
Run 9 time: 1347
Average time is: 1580 ms

Run 0 time: 2197
Run 1 time: 1885
Run 2 time: 1725
Run 3 time: 1671
Run 4 time: 1500
Run 5 time: 1403
Run 6 time: 1411
Run 7 time: 1420
Run 8 time: 1409
Run 9 time: 1362
Average time is: 1598 ms

Hmhmh, as you see multiset is just the same performance as multimap and priority_queue is the most fastest (around 43% faster). So why is that happen?

Let's start from priority_queue, C++ standard doesn't tell us how to implement one or another container or structure, but in most cases it's based on a binary heap (look for msvc and gcc implementation)! In case of priority_queue you have no access to any element except top, you can't iterate through them, get by index, or even take last element (it makes some space for optimization). Average insert for binary heap is O(1) and only the worst case is O(log n) and deletion is O(log n) since we taking element from the bottom then searching next high priority.

What about multimap and multiset. They both usually implemented on red-black binary tree (look for msvc and gcc implementation), where average insert is O(log n) and deletion O(log n) either.

From this point of view priority_queue NEVER can be slower of multiset or multimap. So, back to your question, multiset as priority queue is NOT faster than priority_queue itself. There might be plenty of reasons, including raw priority_queue implementation on old compiler or wrong usage of this structure (the question doesn't contain minimal workable example), besides author did't mentioned compile flags or compiler version, sometimes optimization makes significant changes.

UPDATE 1 upon @noɥʇʎԀʎzɐɹƆ request

Unfortunately I don't have access to linux environment right now, but I have mingw-w64 installed, version info: g++.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0. Used processor just the same as for visual studio: Processor Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 2001 Mhz, 4 Core(s), 8 Logical Processor(s).

So results for g++ -O2 is:

Priority queue
Run 0 time: 775
Run 1 time: 995
Run 2 time: 901
Run 3 time: 807
Run 4 time: 930
Run 5 time: 765
Run 6 time: 799
Run 7 time: 1151
Run 8 time: 760
Run 9 time: 780
Average time is: 866 ms

Run 0 time: 2280
Run 1 time: 1942
Run 2 time: 1607
Run 3 time: 1344
Run 4 time: 1319
Run 5 time: 1210
Run 6 time: 1129
Run 7 time: 1156
Run 8 time: 1244
Run 9 time: 992
Average time is: 1422 ms

Run 0 time: 2530
Run 1 time: 1958
Run 2 time: 1670
Run 3 time: 1390
Run 4 time: 1391
Run 5 time: 1235
Run 6 time: 1088
Run 7 time: 1198
Run 8 time: 1071
Run 9 time: 963
Average time is: 1449 ms

You may notice it's almost the same picture as for msvc.

UPDATE 2 thanks to @JorgeBellon

A quick-bench.com online benchmark link, check it yourself!

Would like to see any additions to my post, cheers!

like image 78
Liastre Avatar answered Oct 05 '22 05:10
