Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More efficient way to populate unordered_set?

I have an array of integers stored contiguously in memory and I want to add them all to a unordered_set collection.

Right now, I am adding them one at a time.

for (int i = 0; i < count; i++)
    collection.insert(pi[i]);

Is there any way to do this more efficiently?

I realize that the items are not stored contiguously within the collection, so it won't be as simply as just handing the array over to the collection. But can this be optimized somehow?

like image 577
Jonathan Wood Avatar asked Apr 12 '19 13:04

Jonathan Wood


People also ask

Is unordered_set faster than set?

For a small number of elements, lookups in a set might be faster than lookups in an unordered_set . Even though many operations are faster in the average case for unordered_set , they are often guaranteed to have better worst case complexities for set (for example insert ).

Is unordered_set faster than vector?

Instead, std::unordered_set can often have approximately 10% faster than std::vector for n≥4 in my experiments. And the std::set is always the worst in that range.

Can unordered_set have duplicates?

Unordered sets do not allow duplicates and are initialized using comma-delimited values enclosed in curly braces.

What is the time complexity of find in unordered_set?

The time complexity of set operations is O(log n) while for unordered_set, it is O(1).


1 Answers

unordered_set has a constructor that takes a range of elements to initially add them:

template< class InputIt >
unordered_set( InputIt first, InputIt last,
           size_type bucket_count = /*implementation-defined*/,
           const Hash& hash = Hash(),
           const key_equal& equal = key_equal(),
           const Allocator& alloc = Allocator() );

So you can just do collection = std::unordered_set{ p, p + count }; and leave it up to implementation.

As other user pointed out in comments, there's also an overload for insert that takes a range:

template< class InputIt >
void insert( InputIt first, InputIt last );

So, just like calling the constructor, you can do, collection.insert(p, p + count);

There's no guarantee that this overloads will be more efficient, since the complexity is linear in both overloads on average, as well as for just inserting elements one by one.

In fact, if we look into how insert is implemented in MSVC, it's very simple

template<class _Iter>
void insert(_Iter _First, _Iter _Last)
{   // insert [_First, _Last) at front, then put in place
    _DEBUG_RANGE(_First, _Last);
    for (; _First != _Last; ++_First)
        emplace(*_First);
}

so no optimization for this case.

I think, the best way to go about this is to call reserve, if you know a number of elements you are going to add and, if there are many collisions (which there won't be for integers), maybe modifying bucket_count.

like image 114
JustSomeGuy Avatar answered Oct 05 '22 11:10

JustSomeGuy