What is the underlying data structure of a STL set in C++?

3 Answers

Step debug into g++ 6.4 stdlibc++ source

Did you know that on Ubuntu's 16.04 default g++-6 package or a GCC 6.4 build from source you can step into the C++ library without any further setup?

By doing that we easily conclude that a Red-black tree used in this implementation.

This makes sense, since std::set can be traversed in order, which would not be efficient in if a hash map were used.

main.cpp

#include <cassert>
#include <set>

int main() {
    std::set<int> s;
    s.insert(1);
    s.insert(2);
    assert(s.find(1) != s.end());
    assert(s.find(2) != s.end());
    assert(s.find(3) == s3.end());
}

Compile and debug:

g++ -g -std=c++11 -O0 -o main.out main.cpp
gdb -ex 'start' -q --args main.out

Now, if you step into s.insert(1) you immediately reach /usr/include/c++/6/bits/stl_set.h:

487 #if __cplusplus >= 201103L
488       std::pair<iterator, bool>
489       insert(value_type&& __x)
490       {
491     std::pair<typename _Rep_type::iterator, bool> __p =
492       _M_t._M_insert_unique(std::move(__x));
493     return std::pair<iterator, bool>(__p.first, __p.second);
494       }
495 #endif

which clearly just forwards to _M_t._M_insert_unique.

So we open the source file in vim and find the definition of _M_t:

      typedef _Rb_tree<key_type, value_type, _Identity<value_type>,
           key_compare, _Key_alloc_type> _Rep_type;
       _Rep_type _M_t;  // Red-black tree representing set.

So _M_t is of type _Rep_type and _Rep_type is a _Rb_tree.

OK, now that is enough evidence for me. If you don't believe that _Rb_tree is a Black-red tree, step a bit further and read the algorithm.

unordered_set uses hash table

Same procedure, but replace set with unordered_set on the code.

This makes sense, since std::unordered_set cannot be traversed in order, so the standard library chose hash map instead of Red-black tree, since hash map has a better amortized insert time complexity.

Stepping into insert leads to /usr/include/c++/6/bits/unordered_set.h:

415       std::pair<iterator, bool>
416       insert(value_type&& __x)
417       { return _M_h.insert(std::move(__x)); }

So we open the source file in vim and search for _M_h:

      typedef __uset_hashtable<_Value, _Hash, _Pred, _Alloc>  _Hashtable;
      _Hashtable _M_h;

So hash table it is.

std::map and std::unordered_map

Analogous for std::set vs std:unordered_set: What data structure is inside std::map in C++?

Performance characteristics

You could also infer the data structure used by timing them:

enter image description here

Graph generation procedure and Heap vs BST analysis and at: Heap vs Binary Search Tree (BST)

We clearly see for:

std::set, a logarithmic insertion time
std::unordered_set, a more complex hashmap pattern:
- on the non-zoomed plot, we clearly see the backing dynamic array doubling on huge one off linearly increasing spikes
- on the zoomed plot, we see that the times are basically constant and going towards 250ns, therefore much faster than the std::map, except for very small map sizes
  
  Several strips are clearly visible, and their inclination becomes smaller whenever the array doubles.
  
  I believe this is due to average linearly increasing linked list walks withing each bin. Then when the array doubles, we have more bins, so shorter walks.

190

answered Oct 21 '22 08:10

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

As KTC said, how std::set is implemented can vary -- the C++ standard simply specifies an abstract data type. In other words, the standard does not specify how a container should be implemented, just what operations it is required to support. However, most implementations of the STL do, as far as I am aware, use red-black trees or other balanced binary search trees of some kind (GNU libstdc++, for instance, uses red-black trees).

While you could theoretically implement a set as a hash table and get faster asymptotic performance (amortized O(key length) versus O(log n) for lookup and insert), that would require having the user supply a hash function for whatever type they wanted to store (see Wikipedia's entry on hash tables for a good explanation of how they work). As for an implementation of a binary search tree, you wouldn't want to use an array -- as Raul mentioned, you would want some kind of Node data structure.

answered Oct 21 '22 09:10

Toli

You could implement a binary search tree by first defining a Node struct:

struct Node
{
  void *nodeData;
  Node *leftChild;
  Node *rightChild;
}

Then, you could define a root of the tree with another Node *rootNode;

The Wikipedia entry on Binary Search Tree has a pretty good example of how to implement an insert method, so I would also recommend checking that out.

In terms of duplicates, they are generally not allowed in sets, so you could either just discard that input, throw an exception, etc, depending on your specification.

answered Oct 21 '22 09:10

Raul Agrait

Related questions
                            
                                c++ issue with function overloading in an inherited class
                            
                                How to read a CMake Variable in C++ source code
                            
                                Is make_shared really more efficient than new?
                            
                                Why does Java read a big file faster than C++?
                            
                                dereferencing a pointer when passing by reference
                            
                                What is a jump table?
                            
                                char vs wchar_t vs char16_t vs char32_t (c++11)
                            
                                Get current time in milliseconds using C++ and Boost
                            
                                cpp / c++ get pointer value or depointerize pointer
                            
                                Where can I find the C++11 standard papers? [duplicate]
                            
                                How much memory do Enums take?
                            
                                What is the meaning of `struct X typedef` vs. `typedef struct X`?
                            
                                How does OpenMP handle nested loops?
                            
                                Delegate Constructor C++
                            
                                Get time since epoch in milliseconds, preferably using C++11 chrono
                            
                                Is it mandatory to escape tabulator characters in C and C++?
                            
                                Why can't the template argument be deduced when it is used as template parameter to another template?
                            
                                What is the difference between .o, .a, and .so files?
                            
                                Functional Programming in C++ [closed]
                            
                                Can you use keyword explicit to prevent automatic conversion of method parameters?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the underlying data structure of a STL set in C++?

Tags:

c++

set

zebraman

People also ask

3 Answers

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Toli

Raul Agrait

Recent Activity

Donate For Us