Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

boost::unordered_map is... ordered?

Tags:

c++

boost

I have a boost::unordered_map, but it appears to be in order, giving me an overwhelming feeling of "You're Doing It Wrong". Why is the output to this in order? I would've expected the underlying hashing algorithm to have randomized this order:

#include <iostream>
#include <boost/unordered_map.hpp>

int main()
{
    boost::unordered_map<int, int> im;

    for(int i = 0; i < 50; ++i)
    {
        im.insert(std::make_pair(i, i));
    }

    boost::unordered_map<int, int>::const_iterator i;

    for(i = im.begin(); i != im.end(); ++i)
    {
        std::cout << i->first << ", " << i->second << std::endl;
    }

    return 0;
}

...gives me...

0, 0
1, 1
2, 2
...
47, 47
48, 48
49, 49

Upon examination of boost's source code:

inline std::size_t hash_value(int v)
{
    return static_cast<std::size_t>(v);
}

...which would explain it. The answers below hold the higher level thinking, as well, which I found useful.

like image 559
Thanatos Avatar asked Jun 14 '10 18:06

Thanatos


2 Answers

While I can't speak to the boost internals as I'm not a C++ guy, I can propose a few higher-level questions that may alleviate your concerns:

1) What are the guarantees of an "unordered" map? Say you have an ordered map, and you want to create a map that does not guarantee ordering. An initial implementation may simply use the ordered map. It's almost never a problem to provide stronger guarantees than you advertise.

2) A hash function is something that hashes X -> int. If you already have an integer, you could use the identity function. While it may not be the most efficient in all cases, it could explain the behavior you're seeing.

Basically, seeing behavior like this is not necessarily a problem.

like image 155
Steven Schlansker Avatar answered Sep 21 '22 20:09

Steven Schlansker


It is probably because your hashes are small integers. Hash tables usually calculate the number of bucket in which to put the item like this: bucket_index = hash%p where p is a prime number, which is the number of hashtable buckets, which is large enough to provide low frequency of collisions.

For integers hash equals to the value of the integer. You have a lot of data, so hashtable selects a large p. For any p larger than i, bucket_index = i%p = i.

When iterating, the hashtable returns items from its buckets in order of their indexes, which for you is the order of keys. :)

Try using larger numbers if you want to see some randomness.

like image 41
Rotsor Avatar answered Sep 25 '22 20:09

Rotsor