Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the difference between set and unordered_set in C++?

I came across this good question, which is similar but not at all same since it talks about Java, which has different implementation of hash-tables, by virtue of having synchronized accessor /mutators: What are the differences between a HashMap and a Hashtable in Java?

So what is the difference in C++ implementation of set and unordered_set? This question can be of course extended to map vs unordered_map and so on for other C++ containers.

Here is my initial assessment:

set: While the standard doesn't explicitly ask it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a tree. Usually as RB tree (as seen in GCC 4.8), which is height-balanced. Since they are height balanced, they have predictable time-complexity for find()

Pros: Compact (compared to other DS in comparison)

Con: Access time complexity is O(lg n)

unordered_set: While the standard doesn't explicitly asks it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a hash-table.

Pros:

  1. Faster (promises amortized O(1) for search)
  2. Easy to convert basic primitives to thread-safe, as compared to tree-DS

Cons:

  1. Look up not guaranteed to be O(1). Theoretical worst case is O(n).
  2. Not as compact as tree (for practical purposes load factors is never 1).

Note: The O(1), for hashtable comes from the assumption that there are no collision. Even with load-factor of .5, every second variable insertion is leading to collision. It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it. More we reduce #operations, sparser hash-table. When the element stored are of size comparable to pointer, then overhead is quite significant.

Did I miss any difference between map/set for performance analysis that one should know?

like image 367
Ajeet Ganga Avatar asked Apr 18 '13 06:04

Ajeet Ganga


People also ask

What is the difference between a set and an unordered_set?

Set is an ordered sequence of unique keys whereas unordered_set is a set in which key can be stored in any order, so unordered. Set is implemented as a balanced tree structure that is why it is possible to maintain order between the elements (by specific tree traversal).

What is unordered_set used for?

An unordered_set is an Associative container that contains an unordered set of data inserted randomly. Each element may occur only once, so duplicates are not allowed. A user can create an unordered set by inserting elements in any order and an unordered set will return data in any order i.e. unordered form.

Which is better set or unordered set?

For a small number of elements, lookups in a set might be faster than lookups in an unordered_set . Even though many operations are faster in the average case for unordered_set , they are often guaranteed to have better worst case complexities for set (for example insert ).

What is std :: unordered_set?

Unordered set is an associative container that contains a set of unique objects of type Key. Search, insertion, and removal have average constant-time complexity. Internally, the elements are not sorted in any particular order, but organized into buckets.


2 Answers

I think you've generally answered your own question, however, this:

Not as compact as tree. (for practical purposes load factors is never 1)

is not necessarily true. Each node of a tree (we'll assume it's a red-black tree) for a type T utilizes space that is equal to at least 2 * pointer_size + sizeof(T) + sizeof(bool). This may be 3 * pointer size depending on whether the tree contains a parent pointer for each tree node.

Compare this to a hash-map: there will be wasted array space for each hash map due to the fact that load factor < 1 as you've said. However, assuming the hash map uses singly linked lists for chaining (and really, there's no real reason not to), each element inserted take only sizeof(T) + pointer size.

Note that this analysis ignores any overhead which may come from extra space used by alignment.

For any element T which has a small size (so, any basic type), the size of the pointers and other overhead dominates. At a load factor of > 0.5 (for example) the std::unordered_set may indeed use up less memory than the equivalent std::set.

The other big missing point is the fact that iterating through a std::set is guaranteed to produce an ordering from smallest to largest, based on the given comparison function, while iterating through an std::unordered_set will return the values in a "random" order.

like image 65
Yuushi Avatar answered Oct 07 '22 22:10

Yuushi


Another difference (though not performance-related) is that set insertion doesn't invalidate iterators, while unordered_set insertion can if it triggers a rehash. In practice it's a pretty minor concern, since references to the actual elements remain valid.

like image 28
dhaffey Avatar answered Oct 07 '22 22:10

dhaffey