Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it right to conform Hashable by only taking id into consideration?

Tags:

swift

hashable

A lot of online example I have came across, when they try to conform to Hashable, they only take id as consideration. For instance https://www.raywenderlich.com/8241072-ios-tutorial-collection-view-and-diffable-data-source , https://medium.com/@JoyceMatos/hashable-protocols-in-swift-baf0cabeaebd , ...

/// Copyright (c) 2020 Razeware LLC
/// 
/// Permission is hereby granted, free of charge, to any person obtaining a copy
/// of this software and associated documentation files (the "Software"), to deal
/// in the Software without restriction, including without limitation the rights
/// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
/// copies of the Software, and to permit persons to whom the Software is
/// furnished to do so, subject to the following conditions:
/// 
/// The above copyright notice and this permission notice shall be included in
/// all copies or substantial portions of the Software.
/// 
/// Notwithstanding the foregoing, you may not use, copy, modify, merge, publish,
/// distribute, sublicense, create a derivative work, and/or sell copies of the
/// Software in any work that is designed, intended, or marketed for pedagogical or
/// instructional purposes related to programming, coding, application development,
/// or information technology.  Permission for such use, copying, modification,
/// merger, publication, distribution, sublicensing, creation of derivative works,
/// or sale is expressly withheld.
/// 
/// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
/// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
/// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
/// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
/// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
/// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
/// THE SOFTWARE.

import UIKit

class Video: Hashable {
  var id = UUID()
  var title: String
  var thumbnail: UIImage?
  var lessonCount: Int
  var link: URL?
  
  init(title: String, thumbnail: UIImage? = nil, lessonCount: Int, link: URL?) {
    self.title = title
    self.thumbnail = thumbnail
    self.lessonCount = lessonCount
    self.link = link
  }
  // 1
  func hash(into hasher: inout Hasher) {
    // 2
    hasher.combine(id)
  }
  // 3
  static func == (lhs: Video, rhs: Video) -> Bool {
    lhs.id == rhs.id
  }
}

I was wondering, is that ever a correct way to conform Hashable? I thought we should take all class member variables, into consideration?

For instance, by using only id in func hash/ func ==, will yield the following misbehaviour.

We are going encounter 2 objects with different content, but func == will return true when comparing 2 objects with different content.

struct Dog: Hashable {
    let id = UUID()
    var name: String
    var age: Int
    
    init(name: String, age: Int) {
        self.name = name
        self.age = age
    }

    func hash(into hasher: inout Hasher) {
        hasher.combine(id)
    }

    static func == (lhs: Dog, rhs: Dog) -> Bool {
        lhs.id == rhs.id
    }
}


var dog0 = Dog(name: "dog", age: 1)
var dog1 = dog0

/*
 dog0 is -5743610764084706839, dog, 1
 dog1 is -5743610764084706839, dog, 1
 compare dog0 with dog1 is true
 */
print("dog0 is \(dog0.hashValue), \(dog0.name), \(dog0.age)")
print("dog1 is \(dog1.hashValue), \(dog1.name), \(dog1.age)")
print("compare dog0 with dog1 is \(dog0 == dog1)")


dog1.name = "another name"
dog1.age = 9

// Same id, but different content!

/*
 dog0 is -5743610764084706839, dog, 1
 dog1 is -5743610764084706839, another name, 9
 compare dog0 with dog1 is true
 */
print("dog0 is \(dog0.hashValue), \(dog0.name), \(dog0.age)")
print("dog1 is \(dog1.hashValue), \(dog1.name), \(dog1.age)")
print("compare dog0 with dog1 is \(dog0 == dog1)")

I was wondering, is it right to conform Hashable by only taking id into consideration?


p/s

I try to look from other languages like Java, on what is the general advice regarding hash code generation . This is what is being written in their popular Effective Java book.

Do not be tempted to exclude significant fields from the hash code computation to improve performance. While the resulting hash function may run faster, its poor quality may degrade hash tables’ performance to the point where they become unusable. In particular, the hash function may be confronted with a large collection of instances that differ mainly in regions you’ve chosen to ignore. If this happens, the hash function will map all these instances to a few hash codes, and programs that should run in linear time will instead run in quadratic time. This is not just a theoretical problem. Prior to Java 2, the String hash function used at most sixteen characters evenly spaced throughout the string, starting with the first character. For large collections of hierarchical names, such as URLs, this function displayed exactly the pathological behavior described earlier.

like image 252
Cheok Yan Cheng Avatar asked Oct 09 '20 11:10

Cheok Yan Cheng


People also ask

Is hashable identifiable?

The hashable requirement here needs a unique identifier, like the identifiable protocol (why it asks for Hashable when it needs Identifiable is beyond me). To use both the Hashable and the Equatable protocol, you need to tell the Hashable protocol to focus on the id, so yes, that unique Identifiable value.

Can a protocol conform to hashable?

You can use any type that conforms to the Hashable protocol in a set or as a dictionary key. Many types in the standard library conform to Hashable : Strings, integers, floating-point and Boolean values, and even sets are hashable by default.

What does hashable mean in Swift?

In Swift, a Hashable is a protocol that provides a hashValue to our object. The hashValue is used to compare two instances. To use the hashValue , we first have to conform (associate) the type (struct, class, etc) to Hashable property.

Does Nsobject conform hashable?

NSObjectProtocol doesn't inherit from Hashable .


3 Answers

TL;DR: This hash function is unnecessary, but legal, and arguably ideal. This == is incorrect, despite being common in tutorials, because it breaks substitutability which is required by Equatable, exactly as you suggest.

However, as matt notes, diffable data sources may require this anyway. That doesn't make it good, but it may make it necessary. (Do read all of matt's comments below. They provide a lot of important context. In reference specifically to diffable data sources, see his answer; I am not particularly familiar with diffable data sources.)


I suggest turning to the documentation, which lays this out.

First, Hashable:

Hashing a value means feeding its essential components into a hash function, represented by the Hasher type. Essential components are those that contribute to the type’s implementation of Equatable. Two instances that are equal must feed the same values to Hasher in hash(into:), in the same order.

The most important thing is that Hashable be consistent with Equatable. Two things must never be equal, but have different hashes.

The converse is not true. It is completely valid for two unequal things to have the same hash. In fact, that's a fundamental fact of hashing called the pigeonhole principle. A good hash improves performance by avoiding unnecessary equality checks. But the following hash(into:) function is always valid:

func hash(into hasher: inout Hasher) {
    hasher.combine(0)
}

This just means that every value has the same hash, and so the system will always call ==. This is bad for performance (and in server applications that can translate into a denial of service attack called hash flooding). But it is legal.

If that's legal, certainly just hashing id is legal.

But....

That brings us to Equatable and its docs, and the most important paragraph (emphasis added):

Equality implies substitutability—any two instances that compare equally can be used interchangeably in any code that depends on their values. To maintain substitutability, the == operator should take into account all visible aspects of an Equatable type. Exposing nonvalue aspects of Equatable types other than class identity is discouraged, and any that are exposed should be explicitly pointed out in documentation.

A value must only be considered equal if they can be substituted for each other in any context, and it will not impact the correctness of the program. Clearly in your example, that's not true. In fact, it's never going to be true for a type with mutable public properties (despite many tutorials that get this wrong). So your == is incorrect. But your hash function is fine, arguably ideal. Its goal is to be a quick check for non-equality that minimizes collisions. If the ids are the same, you still have to check the rest of the values, but if they're different, you know it's not going to be equal.

If your Dog type were immutable (name and age were let rather than var), it might be acceptable to implement == this way. It's impossible to set the id by hand, so it would be impossible to get two values with the same id but different values. But I wouldn't do that unless you could show a significant performance boost. It hangs correctness on too subtle a requirement. For example, if an extension added an init that allowed setting id directly, it would make your == invalid. That's too fragile IMO.

How about private mutable state? As long as that is only for performance purposes (memoization/caching), then it's fine to leave out of == (and hash). But if that internal state can influence externally visible behavior, than it needs to be part of ==.

The good news is, most of the time you don't need to worry. Swift's automatic implementations handle this for you correctly out of the box, and compare all properties. So in your Dog example, the best solution is to just remove the methods (I'm sure you're aware of that; just stating it for folks reading along). Whenever possible, I highly recommend using the default conformances for Hashable and avoid writing your own.

But in cases where you have to implement your own, the rules are simple:

  • Two equal values must be perfectly substitutable in all cases without impacting correctness (though a substitution may impact performance)
  • Two equal values must always have the same hash

The guidelines are also fairly simple: Hashing should be fast, while minimizing collisions.


The one argument I've seen for these incorrect implementations of == is to try to make Set work nicely. IMO, this is a misuse of Set and Equatable, and is not promised to work in expected ways (if you insert a duplicate value with the same identifier, but different properties, it's undefined which of the values will be in the collection). You should not twist Equatable around wanting to use a specific data structure. You should use the data structure that matches your meaning.

In the common case, the right tool is Dictionary as [ID: Value]. It expresses what you really mean: a mapping between an ID and a single value for that ID, rather than an unordered bag of unique values.

There is likely a memory cost in using a Dictionary rather than a Set (since you have to duplicate the ID). But you should only try to work around that after proving there's a problem to be solved.


Also, see matt's comment below. I have not spent a lot of time with the new diffable data sources. I remember when I first saw them that I was concerned they might be misusing of Equatable. If that's true, then you may have to misuse Equatable to use them, and that would explain some tutorials that do it this way. That doesn't make it good Swift, but it may be required by Apple frameworks.


As I've studied Apple's code more (see matt's answer for many), I've noticed that they all follow the rule I discussed above: they are immutable and you cannot set the UUID during init. This construction makes it impossible for two values to have the same id but other values be different, so checking the id is always sufficient. But if you make the values mutable, or you allow the id to be anything other than let id = UUID(), then this construction becomes dangerous.

like image 149
Rob Napier Avatar answered Nov 07 '22 22:11

Rob Napier


That is completely fine. There is only one requirement for Hashable: If a == b then a.hashValue == b.hashValue must also be true. This is fulfilled here, so your struct will work as a dictionary key or as a set member.

Note that this also is fulfilled, if your hash(into:) doesn’t combine any data (or only constant data) into the hasher. This will make hash table lookups slow, but they will still work.

Another option is to compare all fields in your == implementation but only use a subset of them for hashing in hash(into:). That still follows the rules (the other way around is not allowed of course). This may be useful as a performance optimization, but it also may hurt performance. Depends on the distribution of the data you are hashing.

like image 31
Sven Avatar answered Nov 07 '22 20:11

Sven


Whether it is correct or not to only use a subset of properties for a Hashable conformance completely depends on your requirements.

If for a certain object, equality is really only defined by a single variable (or a subset of variables), than it is correct to use that subset of variables for the Hashable (and Equatable conformances).

However, if all properties of a type are required to decide whether two instances are equal or not, then you should use all properties.

like image 41
Dávid Pásztor Avatar answered Nov 07 '22 22:11

Dávid Pásztor