Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slice endpoints invisibly truncated

>>> class Potato(object):
...    def __getslice__(self, start, stop):
...       print start, stop
...         
>>> sys.maxint
9223372036854775807
>>> x = sys.maxint + 69
>>> print x
9223372036854775876
>>> Potato()[123:x]
123 9223372036854775807

Why the call to getslice doesn't respect the stop I sent in, instead silently substituting 2^63 - 1? Does it mean that implementing __getslice__ for your own syntax will generally be unsafe with longs?

I can do whatever I need with __getitem__ anyway, I'm just wondering why __getslice__ is apparently broken.

Edit: Where is the code in CPython which truncates the slice? Is this part of python (language) spec or just a "feature" of cpython (implementation)?

like image 847
wim Avatar asked Sep 01 '14 01:09

wim


1 Answers

The Python C code that handles slicing for objects that implement the sq_slice slot, cannot handle any integers over Py_ssize_t (== sys.maxsize). The sq_slice slot is the C-API equivalent of the __getslice__ special method.

For a two-element slice, Python 2 uses one of the SLICE+* opcodes; this is then handled by the apply_slice() function. This uses the _PyEval_SliceIndex function to convert the Python index objects (int, long, or anything implementing the __index__ method) to a Py_ssize_t integer. The method has the following comment:

/* Extract a slice index from a PyInt or PyLong or an object with the
   nb_index slot defined, and store in *pi.
   Silently reduce values larger than PY_SSIZE_T_MAX to PY_SSIZE_T_MAX,
   and silently boost values less than -PY_SSIZE_T_MAX-1 to -PY_SSIZE_T_MAX-1.
   Return 0 on error, 1 on success.
*/

This means that any slicing in Python 2 using the 2-value syntax is limited to values in the sys.maxsize range when a sq_slice slot is provided.

Slicing using the three-value form (item[start:stop:stride]) uses the BUILD_SLICE opcode instead (followed by BINARY_SUBSCR) and this instead creates a slice() object without limiting to sys.maxsize.

If the object doesn't implement a sq_slice() slot (so no __getslice__ is present) the apply_slice() function also falls back to using a slice() object.

As for this being an implementation detail or part of the language: the Slicings expression documentation distinguishes between simple_slicing and extended_slicing; the former only permits the short_slice form. For simple slicing the indices must be plain integers:

The lower and upper bound expressions, if present, must evaluate to plain integers; defaults are zero and the sys.maxint, respectively.

This suggests that Python 2 the language limits the indices to sys.maxint values, disallowing long integers. In Python 3 simple slicing has been excised from the language altogether.

If your code has to support slicing with values beyond sys.maxsize and you have to inherit from a type that implements __getslice__ then your options are to:

  • use the three-value syntax, with None for the stride:

    Potato()[123:x:None]
    
  • to create slice() objects explicitly:

    Potato()[slice(123, x)]
    

slice() objects can handle long integers just fine; however the slice.indices() method cannot handle lengths over sys.maxsize still:

>>> import sys
>>> s = slice(0, sys.maxsize + 1)
>>> s
slice(0, 9223372036854775808L, None)
>>> s.stop
9223372036854775808L
>>> s.indices(sys.maxsize + 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: cannot fit 'long' into an index-sized integer
like image 196
Martijn Pieters Avatar answered Sep 19 '22 13:09

Martijn Pieters