When working with HashSets
in C#, I recently came across an annoying problem: HashSets
don't guarantee unicity of the elements; they are not Sets. What they do guarantee is that when Add(T item)
is called the item is not added if for any item in the set item.equals(that)
is true
. This holds no longer if you manipulate items already in the set. A small program that demonstrates (copypasta from my Linqpad):
void Main()
{
HashSet<Tester> testset = new HashSet<Tester>();
testset.Add(new Tester(1));
testset.Add(new Tester(2));
foreach(Tester tester in testset){
tester.Dump();
}
foreach(Tester tester in testset){
tester.myint = 3;
}
foreach(Tester tester in testset){
tester.Dump();
}
HashSet<Tester> secondhashset = new HashSet<Tester>(testset);
foreach(Tester tester in secondhashset){
tester.Dump();
}
}
class Tester{
public int myint;
public Tester(int i){
this.myint = i;
}
public override bool Equals(object o){
if (o== null) return false;
Tester that = o as Tester;
if (that == null) return false;
return (this.myint == that.myint);
}
public override int GetHashCode(){
return this.myint;
}
public override string ToString(){
return this.myint.ToString();
}
}
It will happily manipulate the items in the collection to be equal, only filtering them out when a new HashSet is built. What is advicible when I want to work with sets where I need to know the entries are unique? Roll my own, where Add(T item) adds a copy off the item, and the enumerator enumerates over copies of the contained items? This presents the challenge that every contained element should be deep-copyable, at least in its items that influence it's equality.
Another solution would be to roll your own, and only accepts elements that implement INotifyPropertyChanged, and taking action on the event to re-check for equality, but this seems severely limiting, not to mention a whole lot of work and performance loss under the hood.
Yet another possible solution I thought of is making sure that all fields are readonly or const in the constructor. All solutions seem to have very large drawbacks. Do I have any other options?
HashSet is an implementation of Set Collection. Therefore, HashSet is a collection of unique data. In other words, if you try to put an object in a HashSet and that object is already present, the HashSet will ignore it. HashSet allows you add one object at a time or bulk in a form of a collection.
Objects that you insert in HashSet are not guaranteed to be inserted in the same order. Objects are inserted based on their hash code. NULL elements are allowed in HashSet. HashSet also implements Serializable and Cloneable interfaces.
You're really talking about object identity. If you're going to hash items they need to have some kind of identity so they can be compared.
public int myint
. It really should be readonly
, and only set in the constructor.This is a problem with your Tester
objects, not the set. You need to think hard about how you define identity. It's not an easy problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With