Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does std::unordered_set allow insertion of duplicate elements?

Tags:

c++

c++11

hash

I am not understanding something right. I am under the impression that an unordered_set will not allow duplicate elements, based on their hash.

I have a struct, with a specialisation of std::hash, which appears to allow duplicates, although I have manually checked it

AddEdge ( const std::shared_ptr<Relation> relation, const std::shared_ptr<Concept> concept )
{
    auto edge = std::make_shared<Edge>( (Edge){ relation, concept } );
    auto res = _edges.insert ( edge );
    return res.second;
}

An overloaded function does exactly the same but for reversed parameters

This is how struct Edge is hashed:

namespace std
{
    template<> struct hash<shared_ptr<Edge>>
    {
        size_t operator()( const shared_ptr<Edge> & ptr ) const
        {
            size_t from = 0;
            size_t to = 0;

            if ( auto node = std::dynamic_pointer_cast<Concept>( ptr->from ) )
                from = hash<shared_ptr<Concept>>()( node );

            else if ( auto node = std::dynamic_pointer_cast<Relation>( ptr->from ) )
                from = hash<shared_ptr<Relation>>()( node );

            if ( auto node = std::dynamic_pointer_cast<Concept>( ptr->to ) )
                 to = hash<shared_ptr<Concept>>()( node );

            else if ( auto node = std::dynamic_pointer_cast<Relation>( ptr->to ) )
                 to =  hash<shared_ptr<Relation>>()( node );

            return hash<size_t>()( from + to );
        }
    };
}

And the container held in:

std::unordered_set<std::shared_ptr<Edge>> _edges;

When I do:

graph2->AddEdge( sea_node, is_node );
graph2->AddEdge( is_node, blue_node );

I get:

Edge [sea,is] hash = 10017731961838884781
Edge [is,blue] hash = 11178184051384786808

I try a second time exactly the same, and I get the same hashes, yet, when I check the edges, I now have 4 edges instead of 2.

What am I doing wrong?

EDIT: class Concept & Relation have the same kind of hash function:

namespace std
{
    template<> struct hash<shared_ptr<Concept>>
    {
        size_t operator()( const shared_ptr<Concept> & ptr ) const
        {
            return hash<string>()( ptr->asToken()->value() ) + hash<int>()( ptr->TokenIndex() ) + hash<string>()( "Concept" );
        }
    };
}

Even more interestignly, my output from when I add Edges, produces the same hash, and yet it the duplicate Edge is added.

like image 335
Ælex Avatar asked Nov 26 '25 21:11

Ælex


1 Answers

an unordered_set will not allow duplicate elements, based on their hash

No, unordered_set avoids duplicates by comparing values, not the hashes of those values.
The "values" of each of your shared pointers is going to differ because they refer to different objects.

You can actually change this behaviour by providing your own function as the KeyEqual template parameter to unordered_set:

template<
    class Key,
    class Hash = std::hash<Key>,           // <-- you've been looking at this
    class KeyEqual = std::equal_to<Key>,   // <-- instead of this
    class Allocator = std::allocator<Key>
> class unordered_set;

If only one value with a given hash were allowed in an unordered_set then (a) you'd be unable to add any values that genuinely resulted in hash collisions, and (b) the entire hash collision resolution mechanism would become entirely unnecessary.

like image 177
Lightness Races in Orbit Avatar answered Nov 29 '25 16:11

Lightness Races in Orbit



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!