<h3>Summary</h3> Suppose I have an <code>iterator</code> that, as elements are consumed from it, performs some side effect, such as modifying a list. If I define a list <code>l</code> and call <code>l.extend(iterator)</code>, is it guaranteed that <code>extend</code> will push elements onto <code>l</code> one-by-one, as elements from the iterator are consumed, as opposed to kept in a buffer and then pushed on all at once? <h3>My experiments</h3> I did a quick test in Python 3.7 on my computer, and <code>list.extend</code> seems to be lazy based on that test. (See code below.) Is this guaranteed by the spec, and if so, where in the spec is that mentioned? (Also, feel free to criticize me and say "this is not Pythonic, you fool!"--though I would appreciate it if you also answer the question if you want to criticize me. Part of why I'm asking is for my own curiosity.) Say I define an iterator that pushes onto a list as it runs: <pre class="prettyprint lang-py prettyprint-override"><code>l = [] def iterator(k): for i in range(5): print([j in k for j in range(5)]) yield i l.extend(iterator(l)) </code></pre> Here are examples of non-lazy (i.e. buffered) vs. lazy possible <code>extend</code> implementations: <pre class="prettyprint lang-py prettyprint-override"><code>def extend_nonlazy(l, iterator): l += list(iterator) def extend_lazy(l, iterator): for i in iterator: l.append(i) </code></pre> <h3>Results</h3> Here's what happens when I run both known implementations of <code>extend</code>. <hr> Non-lazy: <pre class="prettyprint lang-py prettyprint-override"><code>l = [] extend_nonlazy(l, iterator(l)) </code></pre> <pre class="prettyprint"><code># output [False, False, False, False, False] [False, False, False, False, False] [False, False, False, False, False] [False, False, False, False, False] [False, False, False, False, False] # l = [0, 1, 2, 3, 4] </code></pre> <hr> Lazy: <pre class="prettyprint lang-py prettyprint-override"><code>l = [] extend_lazy(l, iterator(l)) </code></pre> <pre class="prettyprint"><code>[False, False, False, False, False] [True, False, False, False, False] [True, True, False, False, False] [True, True, True, False, False] [True, True, True, True, False] </code></pre> <hr> My own experimentation shows that native <code>list.extend</code> seems to work like the lazy version, but my question is: does the Python spec guarantee that?

I don't think the issue is lazy vs non lazy because, either in slice assignment or in list <code>extend</code>, you need all the elements of the iterator and these elements are consumed at once (in the normal case). The issue you raised is more important: are these operations atomic or not atomic? See one definition of "atomicity" in Wikipedia: <blockquote> Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely, or fails completely. </blockquote> Have a look at this example (CPython 3.6.8): <pre class="prettyprint"><code>>>> def new_iterator(): return (1/(i-2) for i in range(5)) >>> L = [] >>> L[:] = new_iterator() Traceback (most recent call last): ... ZeroDivisionError: division by zero >>> L [] </code></pre> The slice assignment failed because of the exception (<code>i == 2</code> => <code>1/(i - 2)</code> raises an exception) and the list was left unchanged. Hence, the slice assignement operation is atomic. Now, the same example with: <code>extend</code>: <pre class="prettyprint"><code>>>> L.extend(new_iterator()) Traceback (most recent call last): ... ZeroDivisionError: division by zero >>> L [-0.5, -1.0] </code></pre> When the exception was raised, the two first elements were already appended to the list. The <code>extend</code> operation is not atomic, since a failure does not leave the list unchanged. Should the <code>extend</code> operation be atomic or not? Frankly I have no idea about that, but as written in @wim's answer, the real issue is that it's not clearly stated in the documentation (and worse, the documentation asserts that <code>extend</code> is equivalent to the slice assignment, which is not true in the reference implementation).

<blockquote> Is Python <code>list.extend(iterator)</code> guaranteed to be lazy? </blockquote> No. On the contrary, it's documented that <pre class="prettyprint"><code>l.extend(iterable) </code></pre> is equivalent to <pre class="prettyprint"><code>l[len(l):] = iterable </code></pre> In CPython, such a slice assignment will first convert a generator on the right hand side into a list anyway (see here), i.e. it's consuming the <code>iterable</code> all at once. The example shown in your question is, strictly speaking, contradicting the documentation. I've filed a documentation bug, but it was promptly closed by Raymond Hettinger. As an aside, there are less convoluted ways to demonstrate the discrepancy. Just define a failing generator: <pre class="prettyprint"><code>def gen(): yield 1 yield 2 yield 3 uh-oh </code></pre> Now <code>L.extend(gen())</code> will modify <code>L</code>, but <code>L[:] = gen()</code> will not.

Is Python `list.extend(iterator)` guaranteed to be lazy?

Summary

Suppose I have an iterator that, as elements are consumed from it, performs some side effect, such as modifying a list. If I define a list l and call l.extend(iterator), is it guaranteed that extend will push elements onto l one-by-one, as elements from the iterator are consumed, as opposed to kept in a buffer and then pushed on all at once?

My experiments

I did a quick test in Python 3.7 on my computer, and list.extend seems to be lazy based on that test. (See code below.) Is this guaranteed by the spec, and if so, where in the spec is that mentioned?

(Also, feel free to criticize me and say "this is not Pythonic, you fool!"--though I would appreciate it if you also answer the question if you want to criticize me. Part of why I'm asking is for my own curiosity.)

Say I define an iterator that pushes onto a list as it runs:

l = []

def iterator(k):
  for i in range(5):
    print([j in k for j in range(5)])
    yield i

l.extend(iterator(l))

Here are examples of non-lazy (i.e. buffered) vs. lazy possible extend implementations:

def extend_nonlazy(l, iterator):
  l += list(iterator)

def extend_lazy(l, iterator):
  for i in iterator:
    l.append(i)

Results

Here's what happens when I run both known implementations of extend.

Non-lazy:

l = []
extend_nonlazy(l, iterator(l))

# output
[False, False, False, False, False]
[False, False, False, False, False]
[False, False, False, False, False]
[False, False, False, False, False]
[False, False, False, False, False]

# l = [0, 1, 2, 3, 4]

Lazy:

l = []
extend_lazy(l, iterator(l))

[False, False, False, False, False]
[True, False, False, False, False]
[True, True, False, False, False]
[True, True, True, False, False]
[True, True, True, True, False]

My own experimentation shows that native list.extend seems to work like the lazy version, but my question is: does the Python spec guarantee that?

696

asked Jul 25 '19 04:07

Kye W Shi

2 Answers

I don't think the issue is lazy vs non lazy because, either in slice assignment or in list extend, you need all the elements of the iterator and these elements are consumed at once (in the normal case). The issue you raised is more important: are these operations atomic or not atomic? See one definition of "atomicity" in Wikipedia:

Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely, or fails completely.

Have a look at this example (CPython 3.6.8):

>>> def new_iterator(): return (1/(i-2) for i in range(5))
>>> L = []
>>> L[:] = new_iterator()
Traceback (most recent call last):
...
ZeroDivisionError: division by zero
>>> L
[]

The slice assignment failed because of the exception (i == 2 => 1/(i - 2) raises an exception) and the list was left unchanged. Hence, the slice assignement operation is atomic.

Now, the same example with: extend:

>>> L.extend(new_iterator())
Traceback (most recent call last):
...
ZeroDivisionError: division by zero
>>> L
[-0.5, -1.0]

When the exception was raised, the two first elements were already appended to the list. The extend operation is not atomic, since a failure does not leave the list unchanged.

Should the extend operation be atomic or not? Frankly I have no idea about that, but as written in @wim's answer, the real issue is that it's not clearly stated in the documentation (and worse, the documentation asserts that extend is equivalent to the slice assignment, which is not true in the reference implementation).

answered Sep 26 '22 14:09

jferard

Is Python list.extend(iterator) guaranteed to be lazy?

No. On the contrary, it's documented that

l.extend(iterable)

is equivalent to

l[len(l):] = iterable

In CPython, such a slice assignment will first convert a generator on the right hand side into a list anyway (see here), i.e. it's consuming the iterable all at once.

The example shown in your question is, strictly speaking, contradicting the documentation. I've filed a documentation bug, but it was promptly closed by Raymond Hettinger.

As an aside, there are less convoluted ways to demonstrate the discrepancy. Just define a failing generator:

def gen():
    yield 1
    yield 2
    yield 3
    uh-oh

Now L.extend(gen()) will modify L, but L[:] = gen() will not.

answered Sep 22 '22 14:09

wim

Related questions
                            
                                Accessing JVM from python
                            
                                Why does this Jython loop fail after a single run?
                            
                                Headless Selenium + Xvfb + Chrome on OSX 10.11
                            
                                Disable special "class" attribute handling
                            
                                Sending JSON data over WebSocket from Matlab using Python Twisted and Autobahn
                            
                                Django REST Framework (ModelViewSet), 405 METHOD NOT ALLOWED
                            
                                JSON formatted logging with Flask and gunicorn
                            
                                completely self-contained virtual environment
                            
                                Nuitka error Cannot find ' ' in package ' ' as absolute import
                            
                                pandas get_level_values for multiple columns
                            
                                PySpark: StructField(..., ..., False) always returns `nullable=true` instead of `nullable=false`
                            
                                Python Jupyter: Shortcut to copy output of a cell
                            
                                How can we build and distribute python scripts in a windows environment
                            
                                How to keep tensorflow session open between predictions? Loading from SavedModel
                            
                                How to save a custom transformer in sklearn?
                            
                                Tensorflow CNN training images are all different sizes
                            
                                Why are Python operations 30× slower after calling time.sleep or subprocess.Popen?
                            
                                How to destroy Python objects and free up memory
                            
                                Inspect if an argument was passed positionally or via keyword
                            
                                What is the difference between Anaconda Prompt and Anaconda Powershell Prompt?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is Python `list.extend(iterator)` guaranteed to be lazy?

Tags:

python

iterator

list

data-structures

lazy-evaluation