Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HashSet<T>.RemoveWhere() and GetHashCode()

Tags:

c#

Aloha,

Here's a simple class that overrides GetHashCode:

class OverridesGetHashCode
{
    public string Text { get; set; }

    public override int GetHashCode()
    {
        return (Text != null ? Text.GetHashCode() : 0);
    }
    // overriding Equals() doesn't change anything, so I'll leave it out for brevity
}

When I create an instance of that class, add it to a HashSet and then change its Text property, like this:

var hashset = new HashSet<OverridesGetHashCode>();
var oghc = new OverridesGetHashCode { Text = "1" };
hashset.Add(oghc);
oghc.Text = "2";

then this doesn't work:

var removedCount = hashset.RemoveWhere(c => ReferenceEquals(c, oghc));
// fails, nothing is removed
Assert.IsTrue(removedCount == 1);

and neither does this:

// this line works, i.e. it does find a single item matching the predicate
var existing = hashset.Single(c => ReferenceEquals(c, oghc));
// but this fails; nothing is removed again
var removed = hashset.Remove(existing);
Assert.IsTrue(removed); 

I guess the hash it internally uses is generated when item is inserted and, if that's true, it's understandable that hashset.Contains(oghc) doesn't work. I also guess it looks up item by its hash code and if it finds a match, only then it checks the predicate, and that might be why the first test fails (again, I'm just guessing here). But why does the last test fail, I just got that object out of the hashset? Am I missing something, is this a wrong way to remove something from a HashSet?

Thank you for taking the time to read this.

UPDATE: To avoid confusion, here's the Equals():

protected bool Equals(OverridesGetHashCode other)
    {
        return string.Equals(Text, other.Text);
    }

public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((OverridesGetHashCode) obj);
    }
like image 288
Viet Norm Avatar asked Aug 07 '12 14:08

Viet Norm


People also ask

Does HashSet allow duplicates C#?

A HashSet<T> collection is not sorted and cannot contain duplicate elements.

How does a HashSet work c#?

HashSet(IEnumerable) Initializes a new instance of the HashSet class that uses the default equality comparer for the set type, contains elements copied from the specified collection, and has sufficient capacity to accommodate the number of elements copied.


2 Answers

By changing the hash code of your object while that object is being used in a HashSet is a violation of the HashSet's contract.

Being unable to remove the object is not the problem here. You are not allowed to change the hash code in the first place.

Let me quote from MSDN:

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

They tell the story a little differently but the essence is the same. They say, the hash code can never change. In practice, you can change it as long as you make sure no one uses the old hash code anymore. Not that this is good practice, but it works.

like image 191
usr Avatar answered Oct 24 '22 16:10

usr


It's important that any items added to a hash based table (HashSet, Dictionary, etc.) not be modified once they are inserted into the structure (at least not until they are removed).

To find an object in the data structure it computes it hash code, and then finds a location based on that hash code. If you mutate that object then the hash code it returns no longer reflects it's current location in that data structure (unless you're very, very lucky and it just happens to be a hash collision).

On the MSDN page for Dictionary is says:

As long as an object is used as a key in the Dictionary<TKey, TValue>, it must not change in any way that affects its hash value.

This same assertion applies to HashSet as well, as they both are implemented using hash tables.

like image 28
Servy Avatar answered Oct 24 '22 16:10

Servy