Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Python's set difference method take time with an empty set?

Here is what I mean:

> python -m timeit "set().difference(xrange(0,10))"   
1000000 loops, best of 3: 0.624 usec per loop

> python -m timeit "set().difference(xrange(0,10**4))"
10000 loops, best of 3: 170 usec per loop

Apparently python iterates through the whole argument, even if the result is known to be the empty set beforehand. Is there any good reason for this? The code was run in python 2.7.6.

(Even for nonempty sets, if you find that you've removed all of the first set's elements midway through the iteration, it makes sense to stop right away.)

like image 504
dafinguzman Avatar asked Sep 07 '16 19:09

dafinguzman


People also ask

Can Python sets be empty?

To create an empty set in python we have to use the set() function without any arguments, if we will use empty curly braces ” {} ” then we will get an empty dictionary. After writing the above code (create an empty set in python), Ones you will print “type(x)” then the output will appear as a “ <class 'set'> ”.

What does .difference do in Python?

Python Set difference() Method The difference() method returns a set that contains the difference between two sets. Meaning: The returned set contains items that exist only in the first set, and not in both sets.

Is an empty set True or false Python?

Example 1: Using the not Operator Then we used a not operator to reverse the false value. In python, an empty set always evaluates to false. So when we passed an empty set to the if condition it'll be evaluated to false. But the not operator reverses the false value to true value.

How do you subtract one set to another in Python?

The difference between the two sets in Python is equal to the difference between the number of elements in two sets. The function difference() returns a set that is the difference between two sets.


2 Answers

Is there any good reason for this?

Having a special path for the empty set had not come up before.

Even for nonempty sets, if you find that you've removed all of the first set's elements midway through the iteration, it makes sense to stop right away.

This is a reasonable optimization request. I've made a patch and will apply it shortly. Here are the new timings with the patch applied:

 $ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference(r)"
10000000 loops, best of 3: 0.104 usec per loop
 $ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference(r)"
10000000 loops, best of 3: 0.105 usec per loop
 $ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference_update(r)"
10000000 loops, best of 3: 0.0659 usec per loop
 $ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference_update(r)"
10000000 loops, best of 3: 0.0684 usec per loop
like image 183
Raymond Hettinger Avatar answered Oct 12 '22 14:10

Raymond Hettinger


IMO it's a matter of specialisation, consider:

In [18]: r = range(10 ** 4)

In [19]: s = set(range(10 ** 4))

In [20]: %time set().difference(r)
CPU times: user 387 µs, sys: 0 ns, total: 387 µs
Wall time: 394 µs
Out[20]: set()

In [21]: %time set().difference(s)
CPU times: user 10 µs, sys: 8 µs, total: 18 µs
Wall time: 16.2 µs
Out[21]: set()

Apparently difference has specialised implementation for set - set.

Note that difference operator requires right hand argument to be a set, while difference allows any iterable.

Per @wim implementation is at https://github.com/python/cpython/blob/master/Objects/setobject.c#L1553-L1555

like image 27
Dima Tisnek Avatar answered Oct 12 '22 14:10

Dima Tisnek