How is frozenset equality implemented in CPython? In particular, I want to know how individual elements in the fronzenset are compared with each other and their total time complexity.
I took a look at set and frozenset difference in implementation and https://stackoverflow.com/a/51778365/5792721.
The answer in the second link covers steps before comparing individual elements in the set, but missing the actual comparison algorithm.
The implementation in question can be found here. The algorithm goes something like this, given that we want to check if two sets
v
and w
are equal:
false
. This references the size
attribute of the PySetObject
, so it's a constant time operation.
sets
are frozensets
. If so, and their hashes are different, return false
.Again, this references the hash
attribute of the PySetObject
, which is a constant time operation. This is possible because frozensets
are immutable objects and hence have their hashes calculated when they are created.
w
is a subset of v
.This is done by iterating through each element of v
and checking if it is present in w
.
The iteration must be performed over all of v
, and is therefore at worst linear in its size (if a differing element is found in the last position).
Membership testing for sets
in general takes constant time in the average case and linear time in the worst case; combining the two gives time linear in the size of v
in the average case and time proportional to len(v) * len(w)
in the worst case.
This is, in a sense, the average case of the worst case, since it assumes that the first two checks always return true
. If your data only rarely tends to have equal-size sets
which also have the same hashes but contain different objects, then in the average case the comparison will still run in constant time.
Both set and frozenset use the set_richcompare()
function for equality tests. There's no difference here.
set_richcompare()
, as the other answers mention, compares sizes and then hashes. Then, it checks whether one is a subset of the other, which just loops over every element in set 1, and checks if it is in set 2. This check is otherwise identical to checking whether an element is in a set. This is done by hash-table lookup, with a linear search over items with equal hashes. The complexity for the existence check is O(1) amortized or O(len(s)) worst-case, so the check for set_issubset()
has a complexity of O(len(s1)) amortized, or O(len(s1)*len(s2)) worst-case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With