My custom structure implements the Hashable Protocol. However, when hash collisions occur while inserting keys in a Dictionary
, they are not automatically handled. How do I overcome this problem?
I had previously asked this question How to implement the Hashable Protocol in Swift for an Int array (a custom string struct). Later I added my own answer, which seemed to be working.
However, recently I have detected a subtle problem with hashValue
collisions when using a Dictionary
.
I have simplified the code down as far as I can to the following example.
Custom structure
struct MyStructure: Hashable {
var id: Int
init(id: Int) {
self.id = id
}
var hashValue: Int {
get {
// contrived to produce a hashValue collision for id=1 and id=2
if id == 1 {
return 2
}
return id
}
}
}
func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
return lhs.hashValue == rhs.hashValue
}
Note the global function to overload the equality operator (==) in order to conform to the Equatable Protocol, which is required by the Hashable Protocol.
Subtle Dictionary key problem
If I create a Dictionary
with MyStructure
as the key
var dictionary = [MyStructure : String]()
let ok = MyStructure(id: 0) // hashValue = 0
let collision1 = MyStructure(id: 1) // hashValue = 2
let collision2 = MyStructure(id: 2) // hashValue = 2
dictionary[ok] = "some text"
dictionary[collision1] = "other text"
dictionary[collision2] = "more text"
print(dictionary) // [MyStructure(id: 2): more text, MyStructure(id: 0): some text]
print(dictionary.count) // 2
the equal hash values cause the collision1
key to be overwritten by the collision2
key. There is no warning. If such a collision only happened once in a dictionary with 100 keys, then it could easily be missed. (It took me quite a while to notice this problem.)
Obvious problem with Dictionary literal
If I repeat this with a dictionary literal, though, the problem becomes much more obvious because a fatal error is thrown.
let ok = MyStructure(id: 0) // hashValue = 0
let collision1 = MyStructure(id: 1) // hashValue = 2
let collision2 = MyStructure(id: 2) // hashValue = 2
let dictionaryLiteral = [
ok : "some text",
collision1 : "other text",
collision2 : "more text"
]
// fatal error: Dictionary literal contains duplicate keys
I was under the impression that it was not necessary for hashValue
to always return a unique value. For example, Mattt Thompson says,
One of the most common misconceptions about implementing a custom hash function comes from ... thinking that hash values must be distinct.
And the respected SO user @Gaffa says that one way to handle hash collisions is to
Consider hash codes to be non-unique, and use an equality comparer for the actual data to determine uniqueness.
In my opinion, the question Do swift hashable protocol hash functions need to return unique values? has not been adequately answered at the time of this writing.
After reading the Swift Dictionary
question How are hash collisions handled?, I assumed that Swift automatically handled hash collisions with Dictionary
. But apparently that is not true if I am using a custom class or structure.
This comment makes me think the answer is in how the Equatable protocol is implemented, but I am not sure how I should change it.
func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
return lhs.hashValue == rhs.hashValue
}
Is this function called for every dictionary key lookup or only when there is a hash collision? (Update: see this question)
What should I do to determine uniqueness when (and only when) a hash collision occurs?
Hash collision is resolved by open addressing with linear probing. Since CodeMonk and Hashing are hashed to the same index i.e. 2, store Hashing at 3 as the interval between successive probes is 1. There are no more than 20 elements in the data set. Hash function will return an integer from 0 to 19.
3) Collision Handling: Since a hash function gets us a small number for a big key, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique.
Swift dictionary is made of two generic types: Key (which has to be Hashable ) and Value. An entry can be made by providing a key and its value. A value can be retrieved by providing a key which has been inserted before. A entry can be deleted by providing a key.
With the help of the message pre-processing operations, hash collisions are avoided if it is applied before hashing any message. Typically, the message pre-processing function comprises 4 steps: Shuffling of bits, Compression, T-function and LFSR. These steps increase the entropy of the input message.
In this article, we have explored the idea of collision in hashing and explored different collision resolution techniques such as: Hash table: a data structure where the data is stored based upon its hashed key which is obtained using a hashing function. Hash function: a function which for a given data, outputs a value mapped to a fixed range.
In Swift, you may have seen a protocol called Hashable. Anything that conforms to Hashable needs to have a computed integer property called hashValue. How that property is implemented, determines whether will work as your hash function. You can use any type that conforms to the Hashable protocol in a set or as a dictionary key.
1. Open Hashing (Separate chaining) Collisions are resolved using a list of elements to store objects with the same key together. Suppose you wish to store a set of numbers = {0,1,2,4,5,7} into a hash table of size 5. Now, assume that we have a hash function H, such that H (x) = x%5
From matching data to ensuring security, hashing is an excellent way to keep track of the identity of a piece of data. One use for a hashing is for trying to locate duplicate files. Because a hash algorithm theoretically creates a unique output for a given input, if two datasets are identical they should return the same hash.
func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
return lhs.hashValue == rhs.hashValue
}
Note the global function to overload the equality operator (==) in order to conform to the Equatable Protocol, which is required by the Hashable Protocol.
Your problem is an incorrect equality implementation.
A hash table (such as a Swift Dictionary or Set) requires separate equality and hash implementations.
hash gets you close to the object you're looking for; equality gets you the exact object you're looking for.
Your code uses the same implementation for hash and equality, and this will guarantee a collision.
To fix the problem, implement equality to match exact object values (however your model defines equality). E.g.:
func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
return lhs.id == rhs.id
}
I think you have all the pieces of the puzzle you need -- you just need to put them together. You have a bunch of great sources.
Hash collisions are okay. If a hash collision occurs, objects will be checked for equality instead (only against the objects with matching hashes). For this reason, objects' Equatable
conformance needs to be based on something other than hashValue
, unless you are certain that hashes cannot collide.
This is the exact reason that objects that conform to Hashable
must also conform to Equatable
. Swift needs a more domain-specific comparison method for when hashing doesn't cut it.
In that same NSHipster article, you can see how Mattt implements isEqual:
versus hash
in his example Person
class. Specifically, he has an isEqualToPerson:
method that checks against other properties of a person (birthdate, full name) to determine equality.
- (BOOL)isEqualToPerson:(Person *)person {
if (!person) {
return NO;
}
BOOL haveEqualNames = (!self.name && !person.name) || [self.name isEqualToString:person.name];
BOOL haveEqualBirthdays = (!self.birthday && !person.birthday) || [self.birthday isEqualToDate:person.birthday];
return haveEqualNames && haveEqualBirthdays;
}
He does not use a hash value when checking for equality - he uses properties specific to his person class.
Likewise, Swift does not let you simply use a Hashable
object as a dictionary key -- implicitly, by protocol inheritance -- keys must conform to Equatable
as well. For standard library Swift types this has already been taken care of, but for your custom types and class, you must create your own ==
implementation. This is why Swift does not automatically handle dictionary collisions with custom types - you must implement Equatable
yourself!
As a parting thought, Mattt also states that you can often just do an identity check to make sure your two objects are at different memory address, and thus different objects. In Swift, that would like like this:
if person1 === person2 {
// ...
}
There is no guarantee here that person1
and person2
have different properties, just that they occupy separate space in memory. Conversely, in the earlier isEqualToPerson:
method, there is no guarantee that two people with the same names and birthdates are actually same people. Thus, you have to consider what makes sense for you specific object type. Again, another reason that Swift does not implement Equatable
for you on custom types.
the equal hash values cause the collision1 key to be overwritten by the collision2 key. There is no warning. If such a collision only happened once in a dictionary with 100 keys, then it could easily be missed.
Hash collision has nothing to do with it. (Hash collisions never affect the result, only the performance.) It is working exactly as documented.
Dictionary
operations work on equality (==
) of keys. Dictionaries do not contain duplicate keys (meaning keys that are equal). When you set a value with a key, it overwrites any entry containing an equal key. When you get an entry with a subscript, it finds a value with a key that is equal to, not necessarily the same as, the thing you gave. And so on.
collision1
and collision2
are equal (==
), based on the way you defined the ==
operator. Therefore, setting an entry with key collision2
must overwrite any entry with key collision1
.
P.S. The same exact thing applies with dictionaries in other languages. For example, in Cocoa, NSDictionary
does not allow duplicate keys, i.e. keys that are isEqual:
. In Java, Map
s do not allow duplicate keys, i.e. keys that are .equals()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With