Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Dictionary use the Equatable protocol in Swift?

In order to solve this question, I have been playing around with a custom struct that implements the Hashable Protocol. I'm trying to see how many times the equivalency operator overload (==) gets called depending on if there is a hash collision or not when populating a Dictionary.

Update

@matt wrote a much cleaner example of a custom struct that implements the Hashable protocol and shows how often hashValue and == get called. I am copying his code below. To see my original example, check out the edit history.

struct S : Hashable {
    static func ==(lhs:S,rhs:S) -> Bool {
        print("called == for", lhs.id, rhs.id)
        return lhs.id == rhs.id
    }
    let id : Int
    var hashValue : Int {
        print("called hashValue for", self.id)
        return self.id
    }
    init(_ id:Int) {self.id = id}
}
var s = Set<S>()
for i in 1...5 {
    print("inserting", i)
    s.insert(S(i))
}

This produces the results:

/*
inserting 1
called hashValue for 1
inserting 2
called hashValue for 2
called == for 1 2
called hashValue for 1
called hashValue for 2
inserting 3
called hashValue for 3
inserting 4
called hashValue for 4
called == for 3 4
called == for 1 4
called hashValue for 2
called hashValue for 3
called hashValue for 1
called hashValue for 4
called == for 3 4
called == for 1 4
inserting 5
called hashValue for 5
*/

Since Hashable uses Equatable to differentiate hash collisions (I assume anyway), I would expect func ==() only to be called when there are hash collisions. However, there are no hash collisions at all in @matt's example above and yet == is still being called. In my other experiments forcing hash collisions (see this question's edit history), == seemed to be called a random number of times.

What is going on here?

like image 446
Suragch Avatar asked Jul 28 '15 01:07

Suragch


People also ask

What is the use of Equatable protocol in Swift?

In Swift, an Equatable is a protocol that allows two objects to be compared using the == operator. The hashValue is used to compare two instances. To use the hashValue , we first have to conform (associate) the type (struct, class, etc) to Hashable property.

What is dictionary in Swift?

Swift dictionary is an unordered collection of items. It stores elements in key/value pairs. Here, keys are unique identifiers that are associated with each value.

What is the difference between equitable and comparable protocol?

“Equatable” relates to being equal, and “comparable” relates to the comparison between objects. This is important, because how can we be certain that two complex objects are the same? In many circumstances, this is something that you should decide.


2 Answers

I'm copying my answer from bugs.swift.org here. It talks about Sets but the details apply to Dictionaries in the same way.

In hashed collections, collisions can occur whenever the number of buckets is smaller than the keyspace. When you're creating a new Set without specifying a minimum capacity, the set might have only one bucket, so when you're inserting the second element, a collision occurs. The insert method will then decide if the storage should be grown, using something called a load factor. If the storage was grown, the existing elements have to be migrated over to the new storage buffer. That's when you're seeing all those extra calls to hashValue when inserting 4.

The reason you're still seeing more calls to == than you would expect if the number of buckets is equal or higher to the number of elements has to do with an implementation detail of the bucket index calculation. The bits of the hashValue are mixed or "shuffled" before the modulo operation. This is to cut down on excessive collisions for types with bad hash algorithms.

like image 130
robinkunde Avatar answered Oct 29 '22 17:10

robinkunde


Well, there's your answer:

https://bugs.swift.org/browse/SR-3330?focusedCommentId=19980&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-19980

What's actually happening:

  • We hash a value only once on insertion.
  • We don't use hashes for comparison of elements, only ==. Using hashes for comparison is only reasonable if you store the hashes, but that means more memory usage for every Dictionary. A compromise that needs evaluation.
  • We try to insert the element before evaluating if the Dictionary can fit that element. This is because the element might already be in the Dictionary, in which case we don't need any more capacity.
  • When we resize the Dictionary, we have to rehash everything, because we didn't store the hashes.

So what you're seeing is:

  • one hash of the search key
  • some =='s (searching for a space)
  • hashes of every element in the collection (resize)
  • one hash of the search key (actually totally wasteful, but not a big deal considering it only happens after an Ođź‘Ž reallocation)
  • some =='s (searching for a space in the new buffer)

We all had it totally wrong. They don't use the hashes at all — only == — to decide whether this is a distinct key. And then there is a second round of calls in the situation where the collection is grown.

like image 45
matt Avatar answered Oct 29 '22 17:10

matt