Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Same key, multiple entries for std::unordered_map?

Tags:

c++

I have a map inserting multiple values with the same key of C string type.

I would expect to have a single entry with the specified key.

However the map seems to take it's address into consideration when uniquely identifying a key.

#include <cassert>
#include <iostream>
#include <string>
#include <unordered_map>

typedef char const* const MyKey;

/// @brief Hash function for StatementMap keys
///
/// Delegates to std::hash<std::string>.
struct MyMapHash {
public:
    size_t operator()(MyKey& key) const {
        return std::hash<std::string>{}(std::string(key));
    }
};

typedef std::unordered_map<MyKey, int, MyMapHash> MyMap;

int main()
{
    // Build std::strings to prevent optimizations on the addresses of
    // underlying C strings.
    std::string key1_s = "same";
    std::string key2_s = "same";
    MyKey key1 = key1_s.c_str();
    MyKey key2 = key2_s.c_str();

    // Make sure addresses are different.
    assert(key1 != key2);

    // Make sure hashes are identical.
    assert(MyMapHash{}(key1) == MyMapHash{}(key2));

    // Insert two values with the same key.
    MyMap map;
    map.insert({key1, 1});
    map.insert({key2, 2});

    // Make sure we find them in the map.
    auto it1 = map.find(key1);
    auto it2 = map.find(key2);
    assert(it1 != map.end());
    assert(it2 != map.end());

    // Get values.
    int value1 = it1->second;
    int value2 = it2->second;

    // The first one of any of these asserts fails. Why is there not only one
    // entry in the map?
    assert(value1 == value2);
    assert(map.size() == 1u);
}

A print in the debugger shows that map contains two elements just after inserting them.

(gdb) p map
$4 = std::unordered_map with 2 elements = {
  [0x7fffffffda20 "same"] = 2,
  [0x7fffffffda00 "same"] = 1
}

Why does this happen if the hash function which delegates to std::hash<std::string> only takes it's value into account (this is asserted in the code)?

Moreover, if this is the intended behaviour, how can I use a map with C string as key, but with a 1:1 key-value mapping?

like image 956
Andrei Pavel Avatar asked Jul 17 '17 11:07

Andrei Pavel


3 Answers

The reason is that hash maps (like std::unordered_map) do not only rely on the hash function for determining if two keys are equal. The hash function is the first comparison layer, after that the elements are always also compared by value. The reason is that even with good hash functions you might have collisions where two different keys yield the same hash value - but you still need to be able to save both entries in the hashmap. There are various strategies to handle that, you can find more information on looking for collision resolution for hash maps.

In your examples both entries have the same hash value but different values. The values are just compared by the standard comparison function, which compares the char* pointers, which are different. Therefore the value comparison fails and you get two entries in the map. To solve your issue you also need to define a custom equality function for your hash map, which can be done by specifiying the fourth template parameter KeyEqual for std::unordered_map.

like image 108
Matthias247 Avatar answered Nov 12 '22 13:11

Matthias247


This fails because the unordered_map does not and cannot solely rely on the hash function for the key to differentiate keys, but it must also compare keys with the same hash for equality. And comparing two char pointers compares the address pointed to.

If you want to change the comparison, pass a KeyEqual parameter to the map in addition to the hash.

struct MyKeyEqual
{
    bool operator()(MyKey const &lhs, MyKey const &rhs) const
    {
        return std::strcmp(lhs, rhs) == 0;
    }
};
like image 37
PaulR Avatar answered Nov 12 '22 13:11

PaulR


unordered_map needs to be able to perform two operations on the key - checking equality, and obtaining hash code. Naturally, two unequal keys are allowed to have different hash codes. When this happens, unordered map applies hash collision resolution strategy to treat these unequal keys as distinct.

That is precisely what happens when you supply a character pointer for the key, and provide an implementation of hash to it: the default equality comparison for pointers kicks in, so two different pointers produce two different keys, even though the content of the corresponding C strings is the same.

You can fix it by providing a custom implementation of KeyEqual template parameter to perform actual comparison of C strings, for example, by calling strcmp:

return !strcmp(lhsKey, rhsKey);
like image 1
Sergey Kalinichenko Avatar answered Nov 12 '22 11:11

Sergey Kalinichenko