Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is frozenset equality implemented in Python?

How is frozenset equality implemented in CPython? In particular, I want to know how individual elements in the fronzenset are compared with each other and their total time complexity.

I took a look at set and frozenset difference in implementation and https://stackoverflow.com/a/51778365/5792721.

The answer in the second link covers steps before comparing individual elements in the set, but missing the actual comparison algorithm.

like image 287
James Parker Avatar asked Dec 17 '22 16:12

James Parker


2 Answers

The implementation in question can be found here. The algorithm goes something like this, given that we want to check if two sets v and w are equal:

  1. Check if their sizes are equal. If they are not, return false.

This references the size attribute of the PySetObject, so it's a constant time operation.

  1. Check if both sets are frozensets. If so, and their hashes are different, return false.

Again, this references the hash attribute of the PySetObject, which is a constant time operation. This is possible because frozensets are immutable objects and hence have their hashes calculated when they are created.

  1. Check if w is a subset of v.

This is done by iterating through each element of v and checking if it is present in w.

The iteration must be performed over all of v, and is therefore at worst linear in its size (if a differing element is found in the last position).

Membership testing for sets in general takes constant time in the average case and linear time in the worst case; combining the two gives time linear in the size of v in the average case and time proportional to len(v) * len(w) in the worst case.

This is, in a sense, the average case of the worst case, since it assumes that the first two checks always return true. If your data only rarely tends to have equal-size sets which also have the same hashes but contain different objects, then in the average case the comparison will still run in constant time.

like image 134
gmds Avatar answered Dec 28 '22 08:12

gmds


Both set and frozenset use the set_richcompare() function for equality tests. There's no difference here.

set_richcompare(), as the other answers mention, compares sizes and then hashes. Then, it checks whether one is a subset of the other, which just loops over every element in set 1, and checks if it is in set 2. This check is otherwise identical to checking whether an element is in a set. This is done by hash-table lookup, with a linear search over items with equal hashes. The complexity for the existence check is O(1) amortized or O(len(s)) worst-case, so the check for set_issubset() has a complexity of O(len(s1)) amortized, or O(len(s1)*len(s2)) worst-case.

like image 25
muru Avatar answered Dec 28 '22 10:12

muru