Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'order' of unordered Python sets

I understand that sets in Python are unordered, but I'm curious about the 'order' they're displayed in, as it seems to be consistent. They seem to be out-of-order in the same way every time:

>>> set_1 = set([5, 2, 7, 2, 1, 88]) >>> set_2 = set([5, 2, 7, 2, 1, 88]) >>> set_1 set([88, 1, 2, 5, 7]) >>> set_2 set([88, 1, 2, 5, 7]) 

...and another example:

>>> set_3 = set('abracadabra') >>> set_4 = set('abracadabra') >>> set_3 set(['a', 'r', 'b', 'c', 'd']) >>>> set_4 set(['a', 'r', 'b', 'c', 'd']) 

I'm just curious why this would be. Any help?

like image 935
ivan Avatar asked Aug 28 '12 18:08

ivan


People also ask

Is set ordered or unordered in Python?

Sets are unordered. Set elements are unique. Duplicate elements are not allowed. A set itself may be modified, but the elements contained in the set must be of an immutable type.

What determines the order of a set in Python?

AFAIK Python sets are implemented using a hash table. The order in which the items appear depends on the hash function used.

Does the order of a set matter in Python?

Unlike in a standard set, the order of the data in an ordered set is preserved. We used ordered sets when we needed the order in which we entered the data to be maintained over the course of the program. In an ordered set, looking at the data does not change its order as it would in an unordered set.

What is ordered and unordered list in Python?

In Python, you have heard that lists, strings and tuples are ordered collection of objects and sets and dictionaries are unordered collection of objects. If we look at the output for strings, lists and tuples, they are in same order as they are specified intially.


2 Answers

You should watch this video (although it is CPython1 specific and about dictionaries -- but I assume it applies to sets as well).

Basically, python hashes the elements and takes the last N bits (where N is determined by the size of the set) and uses those bits as array indices to place the object in memory. The objects are then yielded in the order they exist in memory. Of course, the picture gets a little more complicated when you need to resolve collisions between hashes, but that's the gist of it.

Also note that the order that they are printed out is determined by the order that you put them in (due to collisions). So, if you reorder the list you pass to set_2, you might get a different order out if there are key collisions.

For example:

list1 = [8,16,24] set(list1)        #set([8, 16, 24]) list2 = [24,16,8] set(list2)        #set([24, 16, 8]) 

Note the fact that the order is preserved in these sets is "coincidence" and has to do with collision resolution (which I don't know anything about). The point is that the last 3 bits of hash(8), hash(16) and hash(24) are the same. Because they are the same, collision resolution takes over and puts the elements in "backup" memory locations instead of the first (best) choice and so whether 8 occupies a location or 16 is determined by which one arrived at the party first and took the "best seat".

If we repeat the example with 1, 2 and 3, you will get a consistent order no matter what order they have in the input list:

list1 = [1,2,3] set(list1)      # set([1, 2, 3]) list2 = [3,2,1] set(list2)      # set([1, 2, 3]) 

since the last 3 bits of hash(1), hash(2) and hash(3) are unique.


1Note The implementation described here applies to CPython dict and set. I think that the general description is valid for all modern versions of CPython up to 3.6. However, starting with CPython3.6, there is an additional implementation detail that actually preserves the insertion order for iteration for dict. It appears that set still do not have this property. The data structure is described by this blog post by the pypy folks (who started using this before the CPython folks). The original idea (at least for the python ecosystem) is archived on the python-dev mailing list.

like image 79
mgilson Avatar answered Sep 22 '22 04:09

mgilson


The reason of such behavior is than Python use hash tables for dictionary implementation: https://en.wikipedia.org/wiki/Hash_table#Open_addressing

Position of the key is defined by it's memory address. If you know Python reuse memory for some objects:

>>> a = 'Hello world' >>> id(a) 140058096568768 >>> a = 'Hello world' >>> id(a) 140058096568480 

You can see that object a has different address every time it's init.

But for small integers it isn't change:

>>> a = 1 >>> id(a) 40060856 >>> a = 1 >>> id(a) 40060856 

Even if we create second object with different name it would be the same:

>>> b = 1 >>> id(b) 40060856 

This approach allow to save memory which Python interpreter consume.

like image 23
Eugene Soldatov Avatar answered Sep 18 '22 04:09

Eugene Soldatov