what is the difference between set and unordered_set in C++?

Tags:

I came across this good question, which is similar but not at all same since it talks about Java, which has different implementation of hash-tables, by virtue of having synchronized accessor /mutators: What are the differences between a HashMap and a Hashtable in Java?

So what is the difference in C++ implementation of set and unordered_set? This question can be of course extended to map vs unordered_map and so on for other C++ containers.

Here is my initial assessment:

set: While the standard doesn't explicitly ask it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a tree. Usually as RB tree (as seen in GCC 4.8), which is height-balanced. Since they are height balanced, they have predictable time-complexity for find()

Pros: Compact (compared to other DS in comparison)

Con: Access time complexity is O(lg n)

unordered_set: While the standard doesn't explicitly asks it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a hash-table.

Pros:

Faster (promises amortized O(1) for search)
Easy to convert basic primitives to thread-safe, as compared to tree-DS

Cons:

Look up not guaranteed to be O(1). Theoretical worst case is O(n).
Not as compact as tree (for practical purposes load factors is never 1).

Note: The O(1), for hashtable comes from the assumption that there are no collision. Even with load-factor of .5, every second variable insertion is leading to collision. It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it. More we reduce #operations, sparser hash-table. When the element stored are of size comparable to pointer, then overhead is quite significant.

Did I miss any difference between map/set for performance analysis that one should know?

367

asked Apr 18 '13 06:04

Ajeet Ganga

2 Answers

I think you've generally answered your own question, however, this:

Not as compact as tree. (for practical purposes load factors is never 1)

is not necessarily true. Each node of a tree (we'll assume it's a red-black tree) for a type T utilizes space that is equal to at least 2 * pointer_size + sizeof(T) + sizeof(bool). This may be 3 * pointer size depending on whether the tree contains a parent pointer for each tree node.

Compare this to a hash-map: there will be wasted array space for each hash map due to the fact that load factor < 1 as you've said. However, assuming the hash map uses singly linked lists for chaining (and really, there's no real reason not to), each element inserted take only sizeof(T) + pointer size.

Note that this analysis ignores any overhead which may come from extra space used by alignment.

For any element T which has a small size (so, any basic type), the size of the pointers and other overhead dominates. At a load factor of > 0.5 (for example) the std::unordered_set may indeed use up less memory than the equivalent std::set.

The other big missing point is the fact that iterating through a std::set is guaranteed to produce an ordering from smallest to largest, based on the given comparison function, while iterating through an std::unordered_set will return the values in a "random" order.

answered Oct 07 '22 22:10

Yuushi

Another difference (though not performance-related) is that set insertion doesn't invalidate iterators, while unordered_set insertion can if it triggers a rehash. In practice it's a pretty minor concern, since references to the actual elements remain valid.

answered Oct 07 '22 22:10

dhaffey

Related questions
                            
                                Can the default destructor be generated as a virtual destructor automatically?
                            
                                How can I iterate through a string and also know the index (current position)?
                            
                                Convert Mat to Array/Vector in OpenCV
                            
                                pop_back() return value?
                            
                                How do I make a C++ macro behave like a function?
                            
                                Purpose of a ".f" appended to a number?
                            
                                What is the advantage of commas in a conditional statement?
                            
                                Efficiently getting all divisors of a given number
                            
                                QComboBox - set selected item based on the item's data
                            
                                Difference between <string> and <string.h>?
                            
                                Prevent Firing Signals in Qt
                            
                                Avoid warning 'Unreferenced Formal Parameter'
                            
                                Iterating over a QMap with for
                            
                                Install Qt on Ubuntu
                            
                                C++ for a C# developer
                            
                                How to make thread sleep less than a millisecond on Windows
                            
                                How to convert vector to set? [closed]
                            
                                How can Boost be used to achieve C++14-style auto return types?
                            
                                Why does C++11 contain an odd clause about comparing void pointers?
                            
                                How is numpy so fast?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

what is the difference between set and unordered_set in C++?

Tags:

c++

algorithm

data-structures

c++11

Ajeet Ganga

People also ask

2 Answers

Yuushi

dhaffey

Recent Activity

Donate For Us