I came across this good question, which is similar but not at all same since it talks about Java, which has different implementation of hash-tables, by virtue of having synchronized accessor /mutators: What are the differences between a HashMap and a Hashtable in Java?
So what is the difference in C++ implementation of set
and unordered_set
? This question can be of course extended to map
vs unordered_map
and so on for other C++ containers.
Here is my initial assessment:
set
: While the standard doesn't explicitly ask it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a tree. Usually as RB tree (as seen in GCC 4.8), which is height-balanced. Since they are height balanced, they have predictable time-complexity for find()
Pros: Compact (compared to other DS in comparison)
Con: Access time complexity is O(lg n)
unordered_set
: While the standard doesn't explicitly asks it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a hash-table.
Pros:
Cons:
Note: The O(1), for hashtable comes from the assumption that there are no collision. Even with load-factor of .5, every second variable insertion is leading to collision. It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it. More we reduce #operations, sparser hash-table. When the element stored are of size comparable to pointer, then overhead is quite significant.
Did I miss any difference between map/set for performance analysis that one should know?
Set is an ordered sequence of unique keys whereas unordered_set is a set in which key can be stored in any order, so unordered. Set is implemented as a balanced tree structure that is why it is possible to maintain order between the elements (by specific tree traversal).
An unordered_set is an Associative container that contains an unordered set of data inserted randomly. Each element may occur only once, so duplicates are not allowed. A user can create an unordered set by inserting elements in any order and an unordered set will return data in any order i.e. unordered form.
For a small number of elements, lookups in a set might be faster than lookups in an unordered_set . Even though many operations are faster in the average case for unordered_set , they are often guaranteed to have better worst case complexities for set (for example insert ).
Unordered set is an associative container that contains a set of unique objects of type Key. Search, insertion, and removal have average constant-time complexity. Internally, the elements are not sorted in any particular order, but organized into buckets.
I think you've generally answered your own question, however, this:
Not as compact as tree. (for practical purposes load factors is never 1)
is not necessarily true. Each node of a tree (we'll assume it's a red-black tree) for a type T
utilizes space that is equal to at least 2 * pointer_size + sizeof(T) + sizeof(bool)
. This may be 3 * pointer size
depending on whether the tree contains a parent
pointer for each tree node.
Compare this to a hash-map: there will be wasted array space for each hash map due to the fact that load factor < 1
as you've said. However, assuming the hash map uses singly linked lists for chaining (and really, there's no real reason not to), each element inserted take only sizeof(T) + pointer size
.
Note that this analysis ignores any overhead which may come from extra space used by alignment.
For any element T
which has a small size (so, any basic type), the size of the pointers and other overhead dominates. At a load factor of > 0.5
(for example) the std::unordered_set
may indeed use up less memory than the equivalent std::set
.
The other big missing point is the fact that iterating through a std::set
is guaranteed to produce an ordering from smallest to largest, based on the given comparison function, while iterating through an std::unordered_set
will return the values in a "random" order.
Another difference (though not performance-related) is that set
insertion doesn't invalidate iterators, while unordered_set
insertion can if it triggers a rehash. In practice it's a pretty minor concern, since references to the actual elements remain valid.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With