I'm working on a sparse list implementation and recently implemented assignment via a slice. This led me to discover some behaviour in Python's built-in <code>list</code> implementation that I find suprising. Given an empty <code>list</code> and an assignment via a slice: <pre class="prettyprint"><code>>>> l = [] >>> l[100:] = ['foo'] </code></pre> I would have expected an <code>IndexError</code> from <code>list</code> here because the way this is implemented means that an item can't be retrieved from the specified index:: <pre class="prettyprint"><code>>>> l[100] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: list index out of range </code></pre> <code>'foo'</code> cannot even be retrieved from the specified slice: <pre class="prettyprint"><code>>>> l = [] >>> l[100:] = ['foo'] >>> l[100:] [] </code></pre> <code>l[100:] = ['foo']</code> appends to the <code>list</code> (that is, <code>l == ['foo']</code> after this assignment) and appears to have behaved this way since the BDFL's initial version. I can't find this functionality documented anywhere (*) but both CPython and PyPy behave this way. Assigning by index raises an error: <pre class="prettyprint"><code>>>> l[100] = 'bar' Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: list assignment index out of range </code></pre> So why does assigning past the end of a <code>list</code> via a slice not raise an <code>IndexError</code> (or some other error, I guess)? <hr> To clarify following the first two comments, this question is specifically about assignment, not retrieval (cf. Why substring slicing index out of range works in Python?). Giving into the temptation to guess and assigning <code>'foo'</code> to <code>l</code> at index 0 when I had explicitly specified index 100 doesn't follow the usual Zen of Python. Consider the case where the assignment happens far away from the initialisation and the index is a variable. The caller can no longer retrieve their data from the specified location. Assigning to a slice before the end of a <code>list</code> behaves somewhat differently to the example above: <pre class="prettyprint"><code>>>> l = [None, None, None, None] >>> l[3:] = ['bar'] >>> l[3:] ['bar'] </code></pre> <hr> (*) This behaviour is defined in Note 4 of 5.6. Sequence Types in the official documentation (thanks elethan) but it's not explained why it would be considered desirable on assignment. <hr> Note: I understand how retrieval works and can see how it may be desirable to be consistent with this for assignment but am looking for a cited reason as to why assigning to a slice would behave in this way. <code>l[100:]</code> returning <code>[]</code> immediately after <code>l[100:] = ['foo']</code> but <code>l[3:]</code> returning <code>['bar']</code> after <code>l[3:] = ['bar']</code> is astonishing if you have no knowledge of <code>len(l)</code>, particularly if you're following Python's EAFP idiom.

Let's see what is actually happening: <pre class="prettyprint"><code>>>> l = [] >>> l[100:] = ['foo'] >>> l[100:] [] >>> l ['foo'] </code></pre> So the assignment was actually successful, and the item got placed into the list, as the first item. Why this happens is because <code>100:</code> in indexing position is converted to a <code>slice</code> object: <code>slice(100, None, None)</code>: <pre class="prettyprint"><code>>>> class Foo: ... def __getitem__(self, i): ... return i ... >>> Foo()[100:] slice(100, None, None) </code></pre> Now, the <code>slice</code> class has a method <code>indices</code> (I am not able to find its Python documentation online, though) that, when given a length of a sequence, will give <code>(start, stop, stride)</code> that is adjusted for the length of that sequence. <pre class="prettyprint"><code>>>> slice(100, None, None).indices(0) (0, 0, 1) </code></pre> Thus when this slice is applied to a sequence of length 0, it behaves exactly like a slice <code>slice(0, 0, 1)</code> for slice retrievals, e.g. instead of <code>foo[100:]</code> throwing an error when <code>foo</code> is an empty sequence, it behaves as if <code>foo[0:0:1]</code> was requested - this will result on empty slice on retrieval. Now the setter code should work correctly when <code>l[100:]</code> was used when l is a sequence that has more than 100 elements. To make it work there, the easiest is to not reinvent the wheel, and to just use the <code>indices</code> mechanism above. As a downside, it will now look a bit peculiar in edge cases, but slice assignments to slices that are "out of bounds" will be placed at the end of the current sequence instead. (However, it turns out that there is little code reuse in the CPython code; <code>list_ass_slice</code> essentially duplicates all this index handling, even though it would also be available via slice object C-API). Thus: if start index of a slice is greater than or equal to the length of a sequence, the resulting slice behaves as if it is a zero-width slice starting from the end of the the sequence. I.e.: if <code>a >= len(l)</code>, <code>l[a:]</code> behaves like <code>l[len(l):len(l)]</code> on built-in types. This is true for each of assignment, retrieval and deletion. The desirability of this is in that it doesn't need any exceptions. The <code>slice.indices</code> method doesn't need to handle any exceptions - for a sequence of length <code>l</code>, <code>slice.indices(l)</code> will always result in <code>(start, end, stride)</code> of indices that can be used for any of assignment, retrieval and deletion, and it is guaranteed that both <code>start</code> and <code>end</code> are <code>0 <= v <= len(l)</code>.

For indexing, an error must be raised if the given index is out-of-bounds, because there is no acceptable default value that could be returned. (It is not acceptable to return <code>None</code>, because <code>None</code> could be a valid element of the sequence). By contrast, for slicing, raising an error is not necessary if any of the indexes are out-of-bounds, because it is acceptable to return an empty sequence as a default value. And it also desirable to do this, because it provides a consistent way refer to subsequences both between elements and beyond the ends of the sequence (thus allowing for insertions). As stated in the Sequence Types Notes, if the start or end value of a slice is greater than <code>len(seq)</code>, then <code>len(seq)</code> is used instead. So given <code>a = [4, 5, 6]</code>, the expressions <code>a[3:]</code> and <code>a[100:]</code> both point to the empty subsequence following the last element in the list. However, after a slice assignment using these expressions, they may no longer refer to the same thing, since the length of the list may have been changed. Thus, after the asignment <code>a[3:] = [7]</code>, the slice <code>a[3:]</code> will return <code>[7]</code>. But after the asignment <code>a[100:] = [8]</code>, the slice <code>a[100:]</code> will still return <code>[]</code>, because <code>len(a)</code> is still less than <code>100</code>. And given everything else stated above, this is exactly what one should expect if consistency between slice assignment and slice retrieval is to be maintained.

Why does assigning past the end of a list via a slice not raise an IndexError? [duplicate]

Tags:

I'm working on a sparse list implementation and recently implemented assignment via a slice. This led me to discover some behaviour in Python's built-in list implementation that I find suprising.

Given an empty list and an assignment via a slice:

>>> l = []
>>> l[100:] = ['foo']

I would have expected an IndexError from list here because the way this is implemented means that an item can't be retrieved from the specified index::

>>> l[100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

'foo' cannot even be retrieved from the specified slice:

>>> l = []
>>> l[100:] = ['foo']
>>> l[100:]
[]

l[100:] = ['foo'] appends to the list (that is, l == ['foo'] after this assignment) and appears to have behaved this way since the BDFL's initial version. I can't find this functionality documented anywhere (*) but both CPython and PyPy behave this way.

Assigning by index raises an error:

>>> l[100] = 'bar'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

So why does assigning past the end of a list via a slice not raise an IndexError (or some other error, I guess)?

To clarify following the first two comments, this question is specifically about assignment, not retrieval (cf. Why substring slicing index out of range works in Python?).

Giving into the temptation to guess and assigning 'foo' to l at index 0 when I had explicitly specified index 100 doesn't follow the usual Zen of Python.

Consider the case where the assignment happens far away from the initialisation and the index is a variable. The caller can no longer retrieve their data from the specified location.

Assigning to a slice before the end of a list behaves somewhat differently to the example above:

>>> l = [None, None, None, None]
>>> l[3:] = ['bar']
>>> l[3:]
['bar']

(*) This behaviour is defined in Note 4 of 5.6. Sequence Types in the official documentation (thanks elethan) but it's not explained why it would be considered desirable on assignment.

Note: I understand how retrieval works and can see how it may be desirable to be consistent with this for assignment but am looking for a cited reason as to why assigning to a slice would behave in this way. l[100:] returning [] immediately after l[100:] = ['foo'] but l[3:] returning ['bar'] after l[3:] = ['bar'] is astonishing if you have no knowledge of len(l), particularly if you're following Python's EAFP idiom.

983

asked Nov 12 '16 01:11

Johnsyweb

2 Answers

Let's see what is actually happening:

>>> l = []
>>> l[100:] = ['foo']
>>> l[100:]
[]
>>> l
['foo']

So the assignment was actually successful, and the item got placed into the list, as the first item.

Why this happens is because 100: in indexing position is converted to a slice object: slice(100, None, None):

>>> class Foo:
...     def __getitem__(self, i):
...         return i
... 
>>> Foo()[100:]
slice(100, None, None)

Now, the slice class has a method indices (I am not able to find its Python documentation online, though) that, when given a length of a sequence, will give (start, stop, stride) that is adjusted for the length of that sequence.

>>> slice(100, None, None).indices(0)
(0, 0, 1)

Thus when this slice is applied to a sequence of length 0, it behaves exactly like a slice slice(0, 0, 1) for slice retrievals, e.g. instead of foo[100:] throwing an error when foo is an empty sequence, it behaves as if foo[0:0:1] was requested - this will result on empty slice on retrieval.

Now the setter code should work correctly when l[100:] was used when l is a sequence that has more than 100 elements. To make it work there, the easiest is to not reinvent the wheel, and to just use the indices mechanism above. As a downside, it will now look a bit peculiar in edge cases, but slice assignments to slices that are "out of bounds" will be placed at the end of the current sequence instead. (However, it turns out that there is little code reuse in the CPython code; list_ass_slice essentially duplicates all this index handling, even though it would also be available via slice object C-API).

Thus: if start index of a slice is greater than or equal to the length of a sequence, the resulting slice behaves as if it is a zero-width slice starting from the end of the the sequence. I.e.: if a >= len(l), l[a:] behaves like l[len(l):len(l)] on built-in types. This is true for each of assignment, retrieval and deletion.

The desirability of this is in that it doesn't need any exceptions. The slice.indices method doesn't need to handle any exceptions - for a sequence of length l, slice.indices(l) will always result in (start, end, stride) of indices that can be used for any of assignment, retrieval and deletion, and it is guaranteed that both start and end are 0 <= v <= len(l).

194

answered Oct 15 '22 17:10

Antti Haapala -- Слава Україні

For indexing, an error must be raised if the given index is out-of-bounds, because there is no acceptable default value that could be returned. (It is not acceptable to return None, because None could be a valid element of the sequence).

By contrast, for slicing, raising an error is not necessary if any of the indexes are out-of-bounds, because it is acceptable to return an empty sequence as a default value. And it also desirable to do this, because it provides a consistent way refer to subsequences both between elements and beyond the ends of the sequence (thus allowing for insertions).

As stated in the Sequence Types Notes, if the start or end value of a slice is greater than len(seq), then len(seq) is used instead.

So given a = [4, 5, 6], the expressions a[3:] and a[100:] both point to the empty subsequence following the last element in the list. However, after a slice assignment using these expressions, they may no longer refer to the same thing, since the length of the list may have been changed.

Thus, after the asignment a[3:] = [7], the slice a[3:] will return [7]. But after the asignment a[100:] = [8], the slice a[100:] will still return [], because len(a) is still less than 100. And given everything else stated above, this is exactly what one should expect if consistency between slice assignment and slice retrieval is to be maintained.

answered Oct 15 '22 15:10

ekhumoro

Related questions
                            
                                Kafka Getting error No resolvable bootstrap urls given in bootstrap servers
                            
                                Lightweight threads in Java?
                            
                                Structural directive - finding the element it is placed on
                            
                                Refreshing page shows file not found error how to resolve using [angular js + NodeJS/ExpressJS ]
                            
                                CSS transitions not working after removing class
                            
                                Why does foreach %dopar% get slower with each additional node?
                            
                                Why is String concatenation faster than String.valueOf for converting an Integer to a String?
                            
                                How to make my logo JS animation scale like image logo on website?
                            
                                Performance difference between JavaScript created inline styles and JavaScript created stylesheets
                            
                                Android: animated custom views
                            
                                Html <img src=...> works but JS Image loading cause CORS error
                            
                                Scikit learn SVC predict probability doesn't work as expected

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With