Efficiently build a python set from a list of unique elements

Question

I have a list of elements which should be unique by construction. I mean, no element will be present more than once in the list.

I want to efficiently test if an item is present in that list, and this for many items.

The test is more efficient if I convert the list into a set.

Now my questions are about how to efficiently build the set.

I guess that when I do my_set = set(my_list), python somehow has to test membership of items in the list as it progressively constructs the set.

Is this sub-optimal given the knowledge that I know that the list does not contain duplicates?
Is it possible to do better?
Does the answer to the question above change if instead of a list, I have an iterator (about which I still know that the items it will yield will be unique)?

Daniel Roseman · Accepted Answer

Python doesn't do explicit membership tests when constructing a set. It doesn't need to; sets are unique by their nature, which is that the members are indexed by their hash values. So in constructing a set, all Python does is hash each element in turn and then insert it in the appropriate place.

The Python docs on time complexity don't explicitly list set construction, but they do say that most operations are the same as for a dict, and insertion into a dict is O(1), by which we can assume that set construction is O(n).

rezca · Answer

Since set() uses a hashtable (see How is set() implemented?), more time would be spent hashing than comparing, and this is unavoidable.

I assume your dataset is very large if you're this concerned about performance. The only way you're going to get better performance is by creating the set() in the first place and avoiding the intermediate memory usage of the list().

$ python3 -m timeit 'set(list(range(100000)))'
100 loops, best of 3: 8.69 msec per loop

$ python3 -m timeit 'set(range(100000))'
100 loops, best of 3: 7.67 msec per loop

$ python3 -m timeit 'frozenset(range(100000))'
100 loops, best of 3: 7.68 msec per loop

Efficiently build a python set from a list of unique elements

Tags:

performance

python

iterator

list

set

bli

2 Answers

Daniel Roseman

rezca

Recent Activity

Donate For Us

Efficiently build a python set from a list of unique elements

Tags:

performance

python

iterator

list

set

bli

2 Answers

Daniel Roseman

rezca

Related questions

Recent Activity

Donate For Us