Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic way to have a "size safe" slicing

Tags:

python

slice

Here is a quote from https://stackoverflow.com/users/893/greg-hewgill answer to Explain Python's slice notation.

Python is kind to the programmer if there are fewer items than you ask for. For example, if you ask for a[:-2] and a only contains one element, you get an empty list instead of an error. Sometimes you would prefer the error, so you have to be aware that this may happen.

So when the error is prefered, what is the Pythonic way to proceed ? Is there a more Pythonic way to rewrite this example ?

class ParseError(Exception):
    pass

def safe_slice(data, start, end):
    """0 <= start <= end is assumed"""
    r = data[start:end]
    if len(r) != end - start:
        raise IndexError
    return r

def lazy_parse(data):
    """extract (name, phone) from a data buffer.
    If the buffer could not be parsed, a ParseError is raised.

    """

    try:
        name_length = ord(data[0])
        extracted_name = safe_slice(data, 1, 1 + name_length)
        phone_length = ord(data[1 + name_length])
        extracted_phone = safe_slice(data, 2 + name_length, 2 + name_length + phone_length)
    except IndexError:
        raise ParseError()
    return extracted_name, extracted_phone

if __name__ == '__main__':
    print lazy_parse("\x04Jack\x0A0123456789") # OK
    print lazy_parse("\x04Jack\x0A012345678") # should raise ParseError

edit: the example was simpler to write using byte strings but my real code is using lists.

like image 715
kasyc Avatar asked Nov 10 '11 09:11

kasyc


2 Answers

Here's one way that is arguably more Pythonic. If you want to parse a byte string you can use the struct module that is provided for that exact purpose:

import struct
from collections import namedtuple
Details = namedtuple('Details', 'name phone')

def lazy_parse(data):
    """extract (name, phone) from a data buffer.
    If the buffer could not be parsed, a ParseError is raised.

    """
    try:
        name = struct.unpack_from("%dp" % len(data), data)[0]
        phone = struct.unpack_from("%dp" % (len(data)-len(name)-1), data, len(name)+1)[0]
    except struct.error:
        raise ParseError()
    return Details(name, phone)

What I still find unpythonic about that is throwing away the useful struct.error traceback to replace with a ParseError whatever that is: the original tells you what is wrong with the string, the latter only tells you that something is wrong.

like image 181
Duncan Avatar answered Sep 25 '22 21:09

Duncan


Using a function like safe_slice would be faster than creating an object just to perform the slice, but if speed is not a bottleneck and you are looking for a nicer interface, you could define a class with a __getitem__ to perform checks before returning the slice.

This allows you to use nice slice notation instead of having to pass both the start and stop arguments to safe_slice.

class SafeSlice(object):
    # slice rules: http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange
    def __init__(self,seq):
        self.seq=seq
    def __getitem__(self,key):
        seq=self.seq
        if isinstance(key,slice):
            start,stop,step=key.start,key.stop,key.step
            if start:
                seq[start]
            if stop:
                if stop<0: stop=len(seq)+stop
                seq[stop-1]
        return seq[key]

seq=[1]
print(seq[:-2])
# []
print(SafeSlice(seq)[:-1])
# []
print(SafeSlice(seq)[:-2])
# IndexError: list index out of range

If speed is an issue, then I suggest just testing the end points instead of doing arithmetic. Item access for Python lists is O(1). The version of safe_slice below also allows you to pass 2,3 or 4 arguments. With just 2 arguments, the second will be interpreted as the stop value, (similar to range).

def safe_slice(seq, start, stop=None, step=1):
    if stop is None:
        stop=start
        start=0
    else:
        seq[start]
    if stop<0: stop=len(seq)+stop
    seq[stop-1]        
    return seq[start:stop:step]
like image 25
unutbu Avatar answered Sep 23 '22 21:09

unutbu