Why does Python's set difference method take time with an empty set?

Tags:

Here is what I mean:

> python -m timeit "set().difference(xrange(0,10))"   
1000000 loops, best of 3: 0.624 usec per loop

> python -m timeit "set().difference(xrange(0,10**4))"
10000 loops, best of 3: 170 usec per loop

Apparently python iterates through the whole argument, even if the result is known to be the empty set beforehand. Is there any good reason for this? The code was run in python 2.7.6.

(Even for nonempty sets, if you find that you've removed all of the first set's elements midway through the iteration, it makes sense to stop right away.)

504

asked Sep 07 '16 19:09

dafinguzman

2 Answers

Is there any good reason for this?

Having a special path for the empty set had not come up before.

Even for nonempty sets, if you find that you've removed all of the first set's elements midway through the iteration, it makes sense to stop right away.

This is a reasonable optimization request. I've made a patch and will apply it shortly. Here are the new timings with the patch applied:

 $ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference(r)"
10000000 loops, best of 3: 0.104 usec per loop
 $ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference(r)"
10000000 loops, best of 3: 0.105 usec per loop
 $ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference_update(r)"
10000000 loops, best of 3: 0.0659 usec per loop
 $ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference_update(r)"
10000000 loops, best of 3: 0.0684 usec per loop

183

answered Oct 12 '22 14:10

Raymond Hettinger

IMO it's a matter of specialisation, consider:

In [18]: r = range(10 ** 4)

In [19]: s = set(range(10 ** 4))

In [20]: %time set().difference(r)
CPU times: user 387 µs, sys: 0 ns, total: 387 µs
Wall time: 394 µs
Out[20]: set()

In [21]: %time set().difference(s)
CPU times: user 10 µs, sys: 8 µs, total: 18 µs
Wall time: 16.2 µs
Out[21]: set()

Apparently difference has specialised implementation for set - set.

Note that difference operator requires right hand argument to be a set, while difference allows any iterable.

Per @wim implementation is at https://github.com/python/cpython/blob/master/Objects/setobject.c#L1553-L1555

answered Oct 12 '22 14:10

Dima Tisnek

Related questions
                            
                                How to specify table relationships in SQLAlchemy with multi-level/multiple joins?
                            
                                Difference between jinja2 functions and filters?
                            
                                Python read-only lists using the property decorator
                            
                                Importing SciPy or scikit-image, "from scipy.linalg import _fblas: Import Error: DLL failed"
                            
                                NLTK other language POS tagger
                            
                                Ansible: Access host/group vars from within custom module
                            
                                How to run code after Flask send_file() or send_from_directory()
                            
                                Renaming downloaded images in Scrapy 0.24 with content from an item field while avoiding filename conflicts?
                            
                                How to save Python NLTK alignment models for later use?
                            
                                Using coverage, how do I test this line?
                            
                                Errno 2 using python shutil.py No such file or directory for file destination
                            
                                Increasing speed of a pure Numpy/Scipy convolutional neural network implementation
                            
                                Python futurize without replacing / with old_div
                            
                                where is the ./configure of TensorFlow and how to enable the GPU support?
                            
                                What does "dict-like" mean in Python?
                            
                                csv: writer.writerows() splitting my string inputs
                            
                                Should variable names have adjectives before or after the noun? [closed]
                            
                                Generating random vectors of Euclidean norm <= 1 in Python?
                            
                                Tox installs the wrong version of pip to it's virtual env
                            
                                Pandas setting multi-index on rows, then transposing to columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does Python's set difference method take time with an empty set?

Tags:

performance

python

operators

set

dafinguzman

People also ask

2 Answers

Raymond Hettinger

Dima Tisnek

Recent Activity

Donate For Us