Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

->first/second to an empty map iterator to begin

Tags:

c++

iterator


I can't understand what is going on in this snippet.
The map reference states "If the container is empty, the returned iterator value shall not be dereferenced."
But what about some_map->begin()->second? on an empty map.
I thought it would be invalid, but this code prints '0'. Can anyone exaplin why?

int main()
{
 map<int,int> a;

 printf("%d",a.begin()->second);
 return 1;
}

Thanks!

like image 682
DsCpp Avatar asked Feb 06 '18 11:02

DsCpp


2 Answers

From this std::map::begin reference

If the container is empty, the returned iterator will be equal to end()

Then looking at this std::map::end reference:

This element acts as a placeholder; attempting to access it results in undefined behavior.

[Emphasis mine]

You are experiencing something called undefined behavior and that's really all there is to say about it. Just don't do anything stupid like that.

like image 134
Some programmer dude Avatar answered Oct 05 '22 07:10

Some programmer dude


As others have pointed out, your code has undefined behavior, and printing 0 is consistent with undefined behavior – as is any other behavior.

However, you may be interested in why this particular behavior occurs, as opposed to, e.g., a segfault. A likely reason is that your map implementation uses dummy nodes. These are actually quite popular for implementations of node-based containers in general.

For example, a map iterator may be a thin wrapper around a node pointer (e.g. to a node in a balanced binary search tree). Now, the obvious way to implement the end iterator is to use a null pointer here. However, it then becomes impossible to implement operator --() on the end iterator without e.g. a pointer to the container as well – and, worse, an additional branch, because now you must check in every call to this operator whether the node pointer is null (I'm abbreviating the code example a bit, by leaving out irrelevant template parameters):

template<typename U, typename V>
typename map<U,V>::iterator &map<U,V>::iterator::operator --()
{
    if ( node_ != nullptr )
    {
        // go to predecessor
        if ( node_->leftChild_ != nullptr )
        {
            for ( node_ = node_->leftChild_; node_->rightChild_ != nullptr; node_ = node_->rightChild_ );
        }
        else
        {
            node *n;
            do
            {
                n = node_;
                node_ = node->parent_;
                // may not go before begin()
                assert( node_ != nullptr );
            } while ( n == node_->leftChild_ );
        }
    }
    else
    {
        // point to last node - map must not be empty
        assert( container_->root_ != nullptr );
        for ( node_ = container_->root_; node_->rightChild_ != nullptr; node = node->rightChild_ );
    }
    return *this;
}

However, if you have a dummy node that is always the rightmost node in the tree, and implement end iterators as wrappers around a pointer to the dummy node, the nullness check and therefore the second branch become redundant and can be removed. Likewise, the use of the container_ pointer becomes completely unnecessary, and so the pointer itself can be removed from the iterator, saving space and reducing the cost to make and copy iterators. (Actually, since C++11 use of this "container pointer" is not even allowed anymore, because containers may be moved, and this does not invalidate the iterators, by definition. A valid solution would be more complex still. MINOR UPDATE: This may have been forbidden always because containers could be swapped without invalidating iterators.)

Now, this explains why an end() iterator may actually point to "real" memory. And so, if operator *() is implemented as:

template<typename U, typename V>
typename map<U,V>::reference map<U,V>::iterator::operator *() const
{
    // cannot dereference end iterator
    assert( hasSuccessor(node_) );
    return *node_;
}

As above, I added a debugging assertion here. How is hasSuccessor implemented? It actually has to ascend all the way to the top, in the worst case, to see if node_ or any of its ancestors has a right child. This check has prohibitive run time cost – i.e. it is literally forbidden by the standard. While its average complexity is only O(1), it has a worst case complexity of O(log N), but the standard requires Θ(1) even in the worst case for this operation. Granted, in a debug build, who cares, and there are other ways to implement the check, such as having a "dummy bit" somewhere in the node. The main point is that no such check is required anyway. Therefore, your attempt to dereferencing the iterator may give you a reference to the dummy node, and since it is "real" allocated memory, you can then proceed to read the mapped value. Note that this does not make the iterator "dereferencable" as far as the standard is concerned, it just means you get neither a segfault nor a program termination as an implementation detail that may change at any time without notice.

Now, there is possibly one more question as to why you get zero, as opposed to, let's say, -135, or some "random" 8-9 digit number for that matter. map is not allowed to invoke any default constructor, for example, that has side effects. Unless you use operator [], it is not even allowed to assume that your mapped type is default constructible. However, there are many other reasons why you may get a neat zero:

  • For types without a constructor w/ side effects, in particular for trivially constructible types such as int, map is indeed free to initialize them as in int{}, e.g. via placement new.
  • The implementation is also allowed to memset the dummy node to zeroes before use. This could be the simplest way to ensure that the pointers are zero.
  • However, the most likely explanation is that you didn't allocate and free other memory of similar size in your test program, so the allocator gave you some "fresh" memory from pages the process received from the operating system. And any multi-user operating system will make sure that memory given to a process cannot contain information from a previous process that might have held the memory. The usual approach is for the OS to memset the whole page(s) to zeroes before letting the process access them.
like image 26
Arne Vogel Avatar answered Oct 05 '22 07:10

Arne Vogel