Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is std::unordered_set contiguous (like std::vector)?

I'm storing pointers in std::unordered_set. I do this because I don't want any duplicates (I delete the pointers in the collection, so if there is a duplicate, I will attempt to delete an already deleted pointer). I loop heavily through these sets, and since I know std::vector is the fastest container for looping (contiguous memory), I wondered if std::unordered_set does the same.

If it doesn't, would using a std::vector and checking if the pointer has already been deleted be faster?

like image 215
Vittorio Romeo Avatar asked Jan 17 '13 17:01

Vittorio Romeo


People also ask

What is std :: unordered_set?

std::unordered_set is an STL container and its introduced in C++11. It provides a functionality of a Set i.e. it can contains the unique elements only. unordered_set stores the elements internally using a Hash Table.

Is unordered_set faster than vector?

Instead, std::unordered_set can often have approximately 10% faster than std::vector for n≥4 in my experiments. And the std::set is always the worst in that range.

Is STD set contiguous?

Therefore no, it does not store objects in contiguous memory. References to elements of the set must remain valid upon insertion to it as well as erasure (except for references to the erased element). This requirement is incompatible with contiguous memory.

How does unordered_set work?

An unordered_set is implemented using a hash table where keys are hashed into indices of a hash table so that the insertion is always randomized.


4 Answers

Is std::unordered_set contiguous ?

The exact implementation of containers is not detailed by the standard... however the standard does prescribes a number of behaviors which constrains the actual representation.

For example, std::unordered_set is required to be memory stable: a reference to/address of an element is valid even when adding/removing other elements.

The only way to achieve this is by allocating elements more or less independently. It cannot be achieved with a contiguous memory allocation as such an allocation would necessarily be bounded, and thus could be overgrown with no possibility of re-allocating the elements in a bigger chunk.

like image 82
Matthieu M. Avatar answered Sep 17 '22 22:09

Matthieu M.


No it is not contiguous memory, but it's still really fast, thanks to a hash map.

Edit: fast for random access, if you mainly do loops, you should consider another container, I think.

Edit2: And you should profile so as to know if it's worth thinking about another container. ( Maybe you should optimize somewhere else... maybe).

like image 22
Stephane Rolland Avatar answered Sep 17 '22 22:09

Stephane Rolland


The fact that the following member functions are offered by std::unordered_map suggests that it is based on a hashed-table, perhaps separate chaining with linked lists.

bucket_count, hash_function, load_factor, max_load_count, rehash

Whether the elements are contiguous or not depends on the allocator. The default allocator for the unordered_map and list does not allocate the elements in contiguous memory. The memory for each element is allocated at the time of its insertion.

However, you can provide a custom allocator (such as a pool allocator) which may allocate the elements from a pre-allocated memory pool. Still, the logically adjacent elements in the data structure may not be physically adjacent in the memory.

So, if looping through all the elements is the most frequent operation, then the unordered_map may not be best solution. Running the dominant use cases through a profiler for all competing solutions would reveal the best solution.

In addition to that, unordered_map is not the best choice to loop for another reason. Note the word "unordered" in the name, it conveys that -- unlike list, vector, or map -- there is no order of the elements. For example, the member function rehash may change the relative order of the elements. In fact, rehashes are automatically performed by the container whenever its load factor is going to exceed the max_load_factor during any operation.

like image 38
Arun Avatar answered Sep 19 '22 22:09

Arun


std::unordered_set is supposed to be a hash map container, so we could assume it has a little performance penalty when comparing with std::vector.

But I think you must check out the actual profiling result if the unordered_set access is real hotspot.

If the STL implementation you are using is reasonable one, it should provide vector like specialization for the pointer or int type key. If it's true, the unordered_set specialized for the pointer type will behave much like the automatically growing/shrinking vector and performance difference will be unnoticeable.

like image 27
9dan Avatar answered Sep 16 '22 22:09

9dan