Python 3 brings the yield from
semantics. As far as I understand it's supposed to yield to the outermost generator in which case I'd expect this code to be linear in N
.
from collections import Iterable
def flatten(L):
for e in L:
if isinstance(e, Iterable):
yield from flatten(e)
else:
yield e
N = 100
L = [-1]
for i in range(N):
L = [i, [L], i]
for i in range(100):
f = list(flatten(L))
print(len(f))
If I set N=200
however the computation time is about four times longer suggesting flatten is quadratic in the length of L
. I can't understand why it would be since the code visits each element only once and the yield from
keyword should be sending values directly from the inner generator to where the terms are collected into the list.
Is this a bug, simply not intended, or am I using it wrong? Is there a nice way to do O(N) flattening in Python?
What Is Yield In Python? The Yield keyword in Python is similar to a return statement used for returning values or objects in Python. However, there is a slight difference. The yield statement returns a generator object to the one who calls the function which contains yield, instead of simply returning a value.
yield from yields each item of the iterable, but yield yields the iterable itself.
You can assign this generator to a variable in order to use it. When you call special methods on the generator, such as next() , the code within the function is executed up to yield . When the Python yield statement is hit, the program suspends function execution and returns the yielded value to the caller.
The yield statement hauls the function and returns back the value to the function caller and restart from where it is left off. The yield statement can be called multiple times. While the return statement ends the execution of the function and returns the value back to the caller.
yield from
, just like for item in x: yield x
, is linear. However, function calls are slow, and because of the nesting in your l
, when you double N, you're not merely doubling the number of terms, you're doubling the number of calls needed. Anything which scales with the number of calls, like function overhead itself esp. due to the recursion, any yield from
overhead, for
loop initialization overhead, whatever, would therefore cause a problem. If this is right, then we'd expect that a list with the same number of elements and no nesting would be linear, mid-nesting would be in-between, and lots of nesting would be super-slow. And that's what we see:
import timeit
s = """
from collections import Iterable
def flatten(l):
for e in l:
if isinstance(e, Iterable):
yield from flatten(e)
else:
yield e
def build_lots(N):
l = [-1]
for i in range(N):
l = [i, [l], i]
return l
def build_some(N):
l = [-1]
for i in range(N):
l = [i]+l+[i] if i % 2 else [i,[l],i]
return l
def build_none(N):
return range(2*N+1)
"""
def get_time(size, which_build, n=100):
setup = s + "l = build_{}({});".format(which_build, size)
ans = timeit.Timer(setup=setup, stmt="z=list(flatten(l))").timeit(n)
return ans
print('length', 'none','some','lots')
for w in range(0, 500, 50):
print(2*w+1,
get_time(w, 'none'),
get_time(w, 'some'),
get_time(w, 'lots'))
produces
length none some lots
1 0.0012789969332516193 0.0006600483320653439 0.000653265044093132
101 0.030214487109333277 0.06863495009019971 0.10554128512740135
201 0.05980244372040033 0.188231083098799 0.3237948380410671
301 0.08960435865446925 0.3752179825678468 0.6493003228679299
401 0.11986956000328064 0.6066137161105871 1.147628225851804
501 0.14737469609826803 0.9323012446984649 1.7087256000377238
601 0.18555369088426232 1.2575508910231292 2.2957410947419703
701 0.20820995513349771 1.712264522910118 3.094764341134578
801 0.23618148919194937 2.100640726275742 4.079551971051842
901 0.26863432209938765 2.617169467266649 4.858607416041195
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With