Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a need for transitivity in Python __eq__?

I'm implementing my own class, with custom __eq__. And I'd like to return True for things that are not "equal" in a mathematical sense, but "match" in a fuzzy way.

An issue with this is, however, that this leads to loss of transitivity in a mathematical sense, i.e. a == b && b ==c, while a may not be equal to c.

Question: is Python dependent on __eq__ being transitive? Will what I'm trying to do break things, or is it possible to do this as long as I'm careful myself not to assume transitivity?

Use case

I want to match telephone numbers with one another, while those may be either formatted internationally, or just for domestic use (without a country code specified). If there's no country code specified, I'd like a number to be equal to a number with one, but if it is specified, it should only be equal to numbers with the same country-code, or without one.

So:

  • Of course, +31 6 12345678 should equal +31 6 12345678, and 06 12345678 should equal 06 12345678
  • +31 6 12345678 should equal 06 12345678 (and v.v.)
  • +49 6 12345678 should equal 06 12345678 (and v.v.)
  • But +31 6 12345678 should not be equal to +49 6 12345678

I don't have a need for hashing (and so won't implement it), so that at least makes life easier.

like image 436
Emil Bode Avatar asked Sep 16 '25 03:09

Emil Bode


2 Answers

There is no MUST but a SHOULD relation for comparisons being consistent with the commonly understood relations. Python expressively does not enforce this and float is an inbuilt type with different behaviour due to float("nan").

Expressions: Value comparisons

[…]
User-defined classes that customize their comparison behavior should follow some consistency rules, if possible:

  • […]
  • Comparison should be symmetric. In other words, the following expressions should have the same result:
    • x == y and y == x
    • x != y and y != x
    • x < y and y > x
    • x <= y and y >= x
  • Comparison should be transitive. The following (non-exhaustive) examples illustrate that:
    • x > y and y > z implies x > z
    • x < y and y <= z implies x < z

Python does not enforce these consistency rules. In fact, the not-a-number values are an example for not following these rules.

Still, keep in mind that exceptions are incredibly rare and subject to being ignored: most people would treat float as having total order, for example. Using uncommon comparison relations can seriously increase maintenance effort.


Canonical ways to model "fuzzy matching" via operators are as subset, subsequence or containment using unsymmetric operators.

  • The set and frozenset support >, >= and so on to indicate that one set encompases all values of another.
    >>> a, b = {1, 5, 6, 8}, {5, 6}
    >>> a >= a, a >= b, b >= a
    (True, True, False)
    
  • The str and bytes support in to indicate that subsequences are covered.
    >>> a, b = "+31 6 12345678", "6 12345678"
    >>> a in b, b in a
    (False, True)
    
  • The range and ipaddress Networks support in to indicate that specific items are covered.
    >>> IPv4Address('192.0.2.6') in IPv4Network('192.0.2.0/28')
    True
    

Notably, while these operators may be transitive they are not symmetric. For example, a >= b and c >= b does not imply b >= c and thus not a >= c or vice versa.

Practically, one could model "number without country code" as the superset of "number with country code" for the same number. This means that 06 12345678 >= +31 6 12345678 and 06 12345678 >= +49 6 12345678 but not vice versa. In order to do a symmetric comparison, one would use a >= b or b >= a instead of a == b.

like image 82
MisterMiyagi Avatar answered Sep 18 '25 18:09

MisterMiyagi


__eq__ method should be transitive; at least it is what dictionaries assume.

class A:
    def __init__(self, name):
        self.name = name

    def __eq__(self, other):
        for element in self.values:
            if element is other:
                return True
        return False

    def __hash__(self):
        return 0

    def __repr__(self):
        return self.name

x, y, z = A('x'), A('y'), A('z')
x.values = [x,y]
y.values = [x,y,z]
z.values = [y,z]

print(x == y)
--> True

print (y == z)
--> True

print(x == z)
--> False

print({**{x:1},**{y:2, z: 3}})
--> {x: 3}

print({**{x:1},**{z:3, y:2}})
--> {x: 1, z: 2}

{**{x:1},**{y: 2, z:3}} is the union of two dictionaries. No one expects a dictionary to delete a key after updating it.

print(z in {**{x:1},**{y:2, z: 3}})
--> False

By changing the order in the union you can even get different sized dictionaries:

print(len({**{x:1},**{y:2, z: 3}}))
--> 1

print(len({**{x:1},**{z:3, y:2}}))
--> 2
like image 22
Fırat Kıyak Avatar answered Sep 18 '25 16:09

Fırat Kıyak