Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How iterating over a std::set returns sorted results

The container std::set (or std::map) is a data structure that STL provides. In almost all compilers it is implemented as a R&B tree with guaranteed log(n) insertion, find and removal time.

https://en.wikipedia.org/wiki/Red%E2%80%93black_tree

In a red and black tree the elements are sorted based on the "less" operator of the stored element. So basically if a root is N + 1 , N will be on the left sub-tree while N + 2 will be on the right sub-tree and this ordering will be decided by the less operator.

My question is when executing the below code:

 set<unsigned long>::iterator it;
 for (it = myset.begin(); it != myset.end(); it++) {
     cout << *it;
 }

the elements are returned in a sorted order. How can this be possible considering the fact that the underlying data structure is a red and black tree? Is there a separate linked list stored to be able to iterate from the leftmost sub-tree to rightmost sub-tree? If not what is the mechanism behind this implementation using a R&B tree?

like image 358
ralzaul Avatar asked Nov 04 '15 16:11

ralzaul


People also ask

Does set store in sorted order C++?

A Set stores the elements in sorted order. Set stores unique elements. Elements can only be inserted and deleted but cannot be modified within the set.

Is std::set sorted?

std::set is an associative container that contains a sorted set of unique objects of type Key . Sorting is done using the key comparison function Compare. Search, removal, and insertion operations have logarithmic complexity.

How do you traverse a set in C++?

Iterating over a set using iterator. In this method, an iterator itr is created and initialized using begin() function which will point to the first element, and after every iteration, itr points to the next element in a set and it will continue to iterate until it reaches the end of the set.

Are values sorted in set C++?

Sets are a type of associative container in which each element has to be unique because the value of the element identifies it. The values are stored in a specific sorted order i.e. either ascending or descending.


2 Answers

We can find a definitive answer by looking at the source code (libstdc++ 5.2.1 in this case). This is how a tree node looks like:

// <libstdc++>/include/bits/stl_tree.h
struct _Rb_tree_node_base {
  typedef _Rb_tree_node_base* _Base_ptr;

  _Rb_tree_color    _M_color;
  _Base_ptr     _M_parent;
  _Base_ptr     _M_left;
  _Base_ptr     _M_right;

  // ...
}

So every node contains a color, and pointers to its parent and its left and right children. Incrementing is implemented as:

//  <libstdc++>/include/bits/stl_tree.h
struct _Rb_tree_iterator {
  _Self& operator++() {
    _M_node = _Rb_tree_increment(_M_node);
    return *this;
  }

// ...

private:
    _Base_ptr _M_node;
};

The actual increment is not in the public headers anymore, but in the compiled part of the library:

// <libstdc++>/src/c++98/tree.cc
static _Rb_tree_node_base* local_Rb_tree_increment(_Rb_tree_node_base* __x) throw ()
{
    if (__x->_M_right != 0) {
        __x = __x->_M_right;
        while (__x->_M_left != 0)
            __x = __x->_M_left;
     } else {
         _Rb_tree_node_base* __y = __x->_M_parent;
         while (__x == __y->_M_right)  {
             __x = __y;
             __y = __y->_M_parent;
         }
         if (__x->_M_right != __y)
             __x = __y;
     }
     return __x;
 }

So, ultimately, it's a textbook implementation of tree traversal: The iterator holds a pointer to the "current" node, and to get to the next node it goes up in the tree as long as it was coming from the right child. If it was coming from the left child, it will descend to leftmost child node of the right child.

like image 109
Benno Avatar answered Oct 09 '22 17:10

Benno


The iterator does an [in-order depth-first tree traversal].1 This is typically implemented in a recursive algorithm. Since the iterator usage can't be implemented recursively, the iterator internally keeps a stack of where it's been so that it can go back up the tree.

Update: thanks to Chris Dodd for pointing out that RB tree nodes have pointers to their parents, so the iterator can simply follow those up to the next element.

like image 24
Klitos Kyriacou Avatar answered Oct 09 '22 16:10

Klitos Kyriacou