Here is what I mean:
> python -m timeit "set().difference(xrange(0,10))"
1000000 loops, best of 3: 0.624 usec per loop
> python -m timeit "set().difference(xrange(0,10**4))"
10000 loops, best of 3: 170 usec per loop
Apparently python iterates through the whole argument, even if the result is known to be the empty set beforehand. Is there any good reason for this? The code was run in python 2.7.6.
(Even for nonempty sets, if you find that you've removed all of the first set's elements midway through the iteration, it makes sense to stop right away.)
To create an empty set in python we have to use the set() function without any arguments, if we will use empty curly braces ” {} ” then we will get an empty dictionary. After writing the above code (create an empty set in python), Ones you will print “type(x)” then the output will appear as a “ <class 'set'> ”.
Python Set difference() Method The difference() method returns a set that contains the difference between two sets. Meaning: The returned set contains items that exist only in the first set, and not in both sets.
Example 1: Using the not Operator Then we used a not operator to reverse the false value. In python, an empty set always evaluates to false. So when we passed an empty set to the if condition it'll be evaluated to false. But the not operator reverses the false value to true value.
The difference between the two sets in Python is equal to the difference between the number of elements in two sets. The function difference() returns a set that is the difference between two sets.
Is there any good reason for this?
Having a special path for the empty set had not come up before.
Even for nonempty sets, if you find that you've removed all of the first set's elements midway through the iteration, it makes sense to stop right away.
This is a reasonable optimization request. I've made a patch and will apply it shortly. Here are the new timings with the patch applied:
$ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference(r)"
10000000 loops, best of 3: 0.104 usec per loop
$ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference(r)"
10000000 loops, best of 3: 0.105 usec per loop
$ py -m timeit -s "r = range(10 ** 4); s = set()" "s.difference_update(r)"
10000000 loops, best of 3: 0.0659 usec per loop
$ py -m timeit -s "r = set(range(10 ** 4)); s = set()" "s.difference_update(r)"
10000000 loops, best of 3: 0.0684 usec per loop
IMO it's a matter of specialisation, consider:
In [18]: r = range(10 ** 4)
In [19]: s = set(range(10 ** 4))
In [20]: %time set().difference(r)
CPU times: user 387 µs, sys: 0 ns, total: 387 µs
Wall time: 394 µs
Out[20]: set()
In [21]: %time set().difference(s)
CPU times: user 10 µs, sys: 8 µs, total: 18 µs
Wall time: 16.2 µs
Out[21]: set()
Apparently difference has specialised implementation for set - set
.
Note that difference operator requires right hand argument to be a set, while difference allows any iterable.
Per @wim implementation is at https://github.com/python/cpython/blob/master/Objects/setobject.c#L1553-L1555
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With