Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does assigning past the end of a list via a slice not raise an IndexError? [duplicate]

Tags:

I'm working on a sparse list implementation and recently implemented assignment via a slice. This led me to discover some behaviour in Python's built-in list implementation that I find suprising.

Given an empty list and an assignment via a slice:

>>> l = []
>>> l[100:] = ['foo']

I would have expected an IndexError from list here because the way this is implemented means that an item can't be retrieved from the specified index::

>>> l[100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

'foo' cannot even be retrieved from the specified slice:

>>> l = []
>>> l[100:] = ['foo']
>>> l[100:]
[]

l[100:] = ['foo'] appends to the list (that is, l == ['foo'] after this assignment) and appears to have behaved this way since the BDFL's initial version. I can't find this functionality documented anywhere (*) but both CPython and PyPy behave this way.

Assigning by index raises an error:

>>> l[100] = 'bar'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

So why does assigning past the end of a list via a slice not raise an IndexError (or some other error, I guess)?


To clarify following the first two comments, this question is specifically about assignment, not retrieval (cf. Why substring slicing index out of range works in Python?).

Giving into the temptation to guess and assigning 'foo' to l at index 0 when I had explicitly specified index 100 doesn't follow the usual Zen of Python.

Consider the case where the assignment happens far away from the initialisation and the index is a variable. The caller can no longer retrieve their data from the specified location.

Assigning to a slice before the end of a list behaves somewhat differently to the example above:

>>> l = [None, None, None, None]
>>> l[3:] = ['bar']
>>> l[3:]
['bar']

(*) This behaviour is defined in Note 4 of 5.6. Sequence Types in the official documentation (thanks elethan) but it's not explained why it would be considered desirable on assignment.


Note: I understand how retrieval works and can see how it may be desirable to be consistent with this for assignment but am looking for a cited reason as to why assigning to a slice would behave in this way. l[100:] returning [] immediately after l[100:] = ['foo'] but l[3:] returning ['bar'] after l[3:] = ['bar'] is astonishing if you have no knowledge of len(l), particularly if you're following Python's EAFP idiom.

like image 983
Johnsyweb Avatar asked Nov 12 '16 01:11

Johnsyweb


People also ask

Does a slice of an entire list create a new object?

The short answerSlicing lists does not generate copies of the objects in the list; it just copies the references to them. That is the answer to the question as asked.

How does slicing work Python?

Python slice() FunctionThe slice() function returns a slice object. A slice object is used to specify how to slice a sequence. You can specify where to start the slicing, and where to end. You can also specify the step, which allows you to e.g. slice only every other item.

What happen if you use out of range index with slicing?

The slicing operation doesn't raise an error if both your start and stop indices are larger than the sequence length. This is in contrast to simple indexing—if you index an element that is out of bounds, Python will throw an index out of bounds error. However, with slicing it simply returns an empty sequence.

What is the purpose of list slicing?

To access a range of items in a list, you need to slice a list. One way to do this is to use the simple slicing operator : With this operator you can specify where to start the slicing, where to end and specify the step.


2 Answers

Let's see what is actually happening:

>>> l = []
>>> l[100:] = ['foo']
>>> l[100:]
[]
>>> l
['foo']

So the assignment was actually successful, and the item got placed into the list, as the first item.

Why this happens is because 100: in indexing position is converted to a slice object: slice(100, None, None):

>>> class Foo:
...     def __getitem__(self, i):
...         return i
... 
>>> Foo()[100:]
slice(100, None, None)

Now, the slice class has a method indices (I am not able to find its Python documentation online, though) that, when given a length of a sequence, will give (start, stop, stride) that is adjusted for the length of that sequence.

>>> slice(100, None, None).indices(0)
(0, 0, 1)

Thus when this slice is applied to a sequence of length 0, it behaves exactly like a slice slice(0, 0, 1) for slice retrievals, e.g. instead of foo[100:] throwing an error when foo is an empty sequence, it behaves as if foo[0:0:1] was requested - this will result on empty slice on retrieval.

Now the setter code should work correctly when l[100:] was used when l is a sequence that has more than 100 elements. To make it work there, the easiest is to not reinvent the wheel, and to just use the indices mechanism above. As a downside, it will now look a bit peculiar in edge cases, but slice assignments to slices that are "out of bounds" will be placed at the end of the current sequence instead. (However, it turns out that there is little code reuse in the CPython code; list_ass_slice essentially duplicates all this index handling, even though it would also be available via slice object C-API).

Thus: if start index of a slice is greater than or equal to the length of a sequence, the resulting slice behaves as if it is a zero-width slice starting from the end of the the sequence. I.e.: if a >= len(l), l[a:] behaves like l[len(l):len(l)] on built-in types. This is true for each of assignment, retrieval and deletion.

The desirability of this is in that it doesn't need any exceptions. The slice.indices method doesn't need to handle any exceptions - for a sequence of length l, slice.indices(l) will always result in (start, end, stride) of indices that can be used for any of assignment, retrieval and deletion, and it is guaranteed that both start and end are 0 <= v <= len(l).

like image 194

For indexing, an error must be raised if the given index is out-of-bounds, because there is no acceptable default value that could be returned. (It is not acceptable to return None, because None could be a valid element of the sequence).

By contrast, for slicing, raising an error is not necessary if any of the indexes are out-of-bounds, because it is acceptable to return an empty sequence as a default value. And it also desirable to do this, because it provides a consistent way refer to subsequences both between elements and beyond the ends of the sequence (thus allowing for insertions).

As stated in the Sequence Types Notes, if the start or end value of a slice is greater than len(seq), then len(seq) is used instead.

So given a = [4, 5, 6], the expressions a[3:] and a[100:] both point to the empty subsequence following the last element in the list. However, after a slice assignment using these expressions, they may no longer refer to the same thing, since the length of the list may have been changed.

Thus, after the asignment a[3:] = [7], the slice a[3:] will return [7]. But after the asignment a[100:] = [8], the slice a[100:] will still return [], because len(a) is still less than 100. And given everything else stated above, this is exactly what one should expect if consistency between slice assignment and slice retrieval is to be maintained.

like image 22
ekhumoro Avatar answered Oct 15 '22 15:10

ekhumoro