Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are tuples constructed from differently initialized sets equal?

I expected the following two tuples

>>> x = tuple(set([1, "a", "b", "c", "z", "f"])) >>> y = tuple(set(["a", "b", "c", "z", "f", 1])) 

to compare unequal, but they don't:

>>> x == y >>> True 

Why is that?

like image 890
Ashish Anand Avatar asked Sep 30 '14 08:09

Ashish Anand


People also ask

How are tuples different from sets?

Lists and tuples are standard Python data types that store values in a sequence. Sets are another standard Python data type that also store values. The major difference is that sets, unlike lists or tuples, cannot have multiple occurrences of the same element and store unordered values.

How do you initialize a tuple with two numbers in Python?

There are two ways to initialize an empty tuple. You can initialize an empty tuple by having () with no values in them. You can also initialize an empty tuple by using the tuple function. A tuple with values can be initialized by making a sequence of values separated by commas.


2 Answers

At first glance, it appears that x should always equal y, because two sets constructed from the same elements are always equal:

>>> x = set([1, "a", "b", "c", "z", "f"]) >>> y = set(["a", "b", "c", "z", "f", 1]) >>> x {1, 'z', 'a', 'b', 'c', 'f'} >>> y {1, 'z', 'a', 'b', 'c', 'f'} >>> x == y True 

However, it is not always the case that tuples (or other ordered collections) constructed from two equal sets are equal.

In fact, the result of your comparison is sometimes True and sometimes False, at least in Python >= 3.3. Testing the following code:

# compare.py x = tuple(set([1, "a", "b", "c", "z", "f"])) y = tuple(set(["a", "b", "c", "z", "f", 1])) print(x == y) 

... a thousand times:

$ for x in {1..1000} > do >   python3.3 compare.py > done | sort | uniq -c 147 False 853 True 

This is because, since Python 3.3, the hash values of strings, bytes and datetimes are randomized as a result of a security fix. Depending on what the hashes are, "collisions" may occur, which will mean that the order items are stored in the underlying array (and therefore the iteration order) depends on the insertion order.

Here's the relevant bit from the docs:

Security improvements:

  • Hash randomization is switched on by default.

— https://docs.python.org/3/whatsnew/3.3.html

EDIT: Since it's mentioned in the comments that the True/False ratio above is superficially surprising ...

Sets, like dictionaries, are implemented as hash tables - so if there's a collision, the order of items in the table (and so the order of iteration) will depend both on which item was added first (different in x and y in this case) and the seed used for hashing (different across Python invocations since 3.3). Since collisions are rare by design, and the examples in this question are smallish sets, the issue doesn't arise as often as one might initially suppose.

For a thorough explanation of Python's implementation of dictionaries and sets, see The Mighty Dictionary.

like image 105
Zero Piraeus Avatar answered Sep 27 '22 21:09

Zero Piraeus


There are two things at play here.

  1. Sets are unordered. set([1, "a", "b", "c", "z", "f"])) == set(["a", "b", "c", "z", "f", 1])

  2. When you convert a set to a tuple via the tuple constructor it essentially iterates over the set and adds each element returned by the iteration .

The constructor syntax for tuples is

tuple(iterable) -> tuple initialized from iterable's items 

Calling tuple(set([1, "a", "b", "c", "z", "f"])) is the same as calling tuple([i for i in set([1, "a", "b", "c", "z", "f"])])

The values for

[i for i in set([1, "a", "b", "c", "z", "f"])] 

and

[i for i in set(["a", "b", "c", "z", "f", 1])] 

are the same as it iterates over the same set.

EDIT thanks to @ZeroPiraeus (check his answer ). This is not guaranteed. The value of the iteration will not always be the same even for the same set.

The tuple constructor doesn't know the order in which the set is constructed.

like image 29
srj Avatar answered Sep 27 '22 19:09

srj