In my application I have the following requirements -
The data structure will be populated just once with some values (not key/value pairs). The values may be repeated but I want the data structure to store them once only.
I will be iterating 100s of times through all the elements of the data structure created above. The order in which the elements appear in the iteration is immaterial.
Constraint 1 suggests that I will either have to use set or unordered_set as data is not in the form of key value pairs.
Now set insertion is costlier than unordered_set insertion but the data structure is populated only once in the beginning of my program.
I believe the deciding factor will be how fast I can iterate through all the elements of the data structure. I am not sure whether set or unordered_set will be faster for this purpose. I believe the standard makes no mention of this fact as this operation will be O(n) for either data structure. But I wonder for which data structure iterator.next() will be faster.
Comparing the timings for std::set , std::unordered_set with the original hash, and std::unordered_set with the extra hash, we see that for sufficiently large number of elements, std::unordered_set actually becomes faster than std::set : Another solution would be not to use a power of two for the number of buckets.
Instead, std::unordered_set can often have approximately 10% faster than std::vector for n≥4 in my experiments.
Set is an ordered sequence of unique keys whereas unordered_set is a set in which key can be stored in any order, so unordered. Set is implemented as a balanced tree structure that is why it is possible to maintain order between the elements (by specific tree traversal).
Average case: constant.
There are several approaches.
std::unordered_set
that has the fastest O(1)
lookup/insertion and O(N)
iteration (as has every container). If you have data that changes a lot, or requires a lot of random lookups, this is probably the fastest. But test.O(N)
copy to a std::vector
and gain from contiguous memory layout 100s of times. Test whether this is faster than a regular std::unordered_set
. boost::flat_set
which offers a std::set
interface with a std::vector
storage back-end (i.e. a contiguous memory layout that is very cache- and prefetch friendly). Again, test whether this gives a speed-up to the other two solutions.For the last solution, see the Boost documentation for some of the tradeoffs (it's good to be aware of all the other issues like iterator invalidation, move semantics and exception safety as well):
Boost.Container flat_[multi]map/set containers are ordered-vector based associative containers based on Austern's and Alexandrescu's guidelines. These ordered vector containers have also benefited recently with the addition of move semantics to C++, speeding up insertion and erasure times considerably. Flat associative containers have the following attributes:
- Faster lookup than standard associative containers
- Much faster iteration than standard associative containers
- Less memory consumption for small objects (and for big objects if shrink_to_fit is used)
- Improved cache performance (data is stored in contiguous memory)
- Non-stable iterators (iterators are invalidated when inserting and erasing elements)
- Non-copyable and non-movable values types can't be stored
- Weaker exception safety than standard associative containers (copy/move constructors can throw when shifting values in erasures and insertions)
- Slower insertion and erasure than standard associative containers (specially for non-movable types)
NOTE: with faster lookup, it is meant that a flat_set
does O(log N)
on contiguous memory rather than O(log N)
pointer chasing of a regular std::set
. Of course, a std::unordered_set
does O(1)
lookup, which will faster for large N
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With