Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

itertools.islice implementation -- efficiently slicing a list

Earlier, I was trying to answer a question where I wanted to iterate over a list slice as efficiently as possible.

for x in lst[idx1:]:

isn't ideal as it creates a copy (In general, this is O(n)). My next thought was to use itertools.islice. But if you look at the documentation, it appears that islice will call next until it finds the index it is looking for at which point it will start to yield values. This is also O(n). It seems that there is an optimization that is available here if the object passed to islice is a list or a tuple -- It seems that you could iterate over the "slice" directly (in C) without actually making a copy. I was curious if this optimization is in the source, But I didn't find anything. I'm not extremely familiar with C and the python source tree, so it's entirely possible that I missed it.

My question is this:

Is there a way to iterate over a list "slice" without making a copy of the list slice and without burning through a bunch of unwanted elements (in an optimized C implementation)?

I'm well aware that I could write my own generator for this (very naively, not accounting for the fact that many of the arguments should be optional, etc.):

def myslice(obj,start,stop,stride):
    for i in xrange(start,stop,stride):
        yield obj[i]

but that's definitely not going to beat an optimized C implementation.


If you're wondering why I would need this over just looping over a slice directly, consider the difference between:

takewhile(lambda x: x == 5, lst[idx:])  #copy's the tail of the list unnecessarily

and

takewhile(lambda x: x == 5, islice(lst,idx,None)) #inspects the head of the list unnecessarily 

and finally:

takewhile(lambda x: x == 5, magic_slice(lst,idx,None)) #How to create magic_slice???
like image 522
mgilson Avatar asked Nov 29 '12 15:11

mgilson


2 Answers

I think it's worth mentioning that NumPy slices are non-copying (they create a view onto the underlying array). Therefore, if you can use NumPy arrays for your data, that would solve the problem. On top of that you could get additional performance improvements through vectorization.

like image 197
NPE Avatar answered Sep 17 '22 15:09

NPE


Is there a way to iterate over a list "slice" without making a copy of the list slice and without burning through a bunch of unwanted elements (in an optimized C implementation)?

Yes there is, if you write that C implementation. Cython makes this particularly easy.

cdef class ListSlice(object):
    cdef object seq
    cdef Py_ssize_t start, end

    def __init__(self, seq, Py_ssize_t start, Py_ssize_t end):
        self.seq = seq
        self.start = start
        self.end = end

    def __iter__(self):
        return self

    def __next__(self):
        if self.start == self.end:
            raise StopIteration()
        r = self.seq[self.start]
        self.start += 1
        return r
like image 26
Fred Foo Avatar answered Sep 17 '22 15:09

Fred Foo