Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient iteration over slice in Python

How efficient are iterations over slice operations in Python? And if a copy is inevitable with slices, is there an alternative?

I know that a slice operation over a list is O(k), where k is the size of the slice.

x[5 : 5+k]  # O(k) copy operation

However, when iterating over a part of a list, I find that the cleanest (and most Pythonic?) way to do this (without having to resort to indices) is to do:

for elem in x[5 : 5+k]:
  print elem

However my intuition is that this still results in an expensive copy of the sublist, rather than simply iterating over the existing list.

like image 857
WaelJ Avatar asked Aug 04 '13 23:08

WaelJ


People also ask

Can we use for loop in slices?

Given this practice, you can now write a for loop to yield slice1 , slice2 , and slice3 in a row. We did it by defining a variable gap . There is always a gap of 5 between the starting index and the ending index because the slices we want are of length 5.

Is Python slicing inclusive?

Python is a zero-indexed language (things start counting from zero), and is also left inclusive, right exclusive you are when specifying a range of values. This applies to objects like lists and Series , where the first element has a position (index) of 0.

What is extended slicing in Python?

The slice syntax is a handy way to refer to sub-parts of sequences – typically strings and lists. The slice s[start:end] is the elements beginning at start and extending up to but not including end.

What does slice () Do Python?

Python slice() Function The slice() function returns a slice object. A slice object is used to specify how to slice a sequence. You can specify where to start the slicing, and where to end. You can also specify the step, which allows you to e.g. slice only every other item.


2 Answers

Use:

for elem in x[5 : 5+k]:

It's Pythonic! Don't change this until you've profiled your code and determined that this is a bottleneck -- though I doubt you will ever find this to be the main source of a bottleneck.


In terms of speed it will probably be your best choice:

In [30]: x = range(100)

In [31]: k = 90

In [32]: %timeit x[5:5+k]
1000000 loops, best of 3: 357 ns per loop

In [35]: %timeit list(IT.islice(x, 5, 5+k))
100000 loops, best of 3: 2.42 us per loop

In [36]: %timeit [x[i] for i in xrange(5, 5+k)]
100000 loops, best of 3: 5.71 us per loop

In terms of memory, it is not as bad you might think. x[5: 5+k] is a shallow copy of part of x. So even if the objects in x are large, x[5: 5+k] is creating a new list with k elements which reference the same objects as in x. So you only need extra memory to create a list with k references to pre-existing objects. That probably is not going to be the source of any memory problems.

like image 96
unutbu Avatar answered Oct 11 '22 06:10

unutbu


You can use itertools.islice to get a sliced iterator from the list:

Example:

>>> from itertools import islice
>>> lis = range(20)
>>> for x in islice(lis, 10, None, 1):
...     print x
...     
10
11
12
13
14
15
16
17
18
19

Update:

As noted by @user2357112 the performance of islice depends on the start point of slice and the size of the iterable, normal slice is going to be fast in almost all cases and should be preferred. Here are some more timing comparisons:

For Huge lists islice is slightly faster or equal to normal slice when the slice's start point is less than half the size of list, for bigger indexes normal slice is the clear winner.

>>> def func(lis, n):
        it = iter(lis)
        for x in islice(it, n, None, 1):pass
...     
>>> def func1(lis, n):
        #it = iter(lis)
        for x in islice(lis, n, None, 1):pass
...     
>>> def func2(lis, n):
        for x in lis[n:]:pass
...     
>>> lis = range(10**6)

>>> n = 100
>>> %timeit func(lis, n)
10 loops, best of 3: 62.1 ms per loop
>>> %timeit func1(lis, n)
1 loops, best of 3: 60.8 ms per loop
>>> %timeit func2(lis, n)
1 loops, best of 3: 82.8 ms per loop

>>> n = 1000
>>> %timeit func(lis, n)
10 loops, best of 3: 64.4 ms per loop
>>> %timeit func1(lis, n)
1 loops, best of 3: 60.3 ms per loop
>>> %timeit func2(lis, n)
1 loops, best of 3: 85.8 ms per loop

>>> n = 10**4
>>> %timeit func(lis, n)
10 loops, best of 3: 61.4 ms per loop
>>> %timeit func1(lis, n)
10 loops, best of 3: 61 ms per loop
>>> %timeit func2(lis, n)
1 loops, best of 3: 80.8 ms per loop


>>> n = (10**6)/2
>>> %timeit func(lis, n)
10 loops, best of 3: 39.2 ms per loop
>>> %timeit func1(lis, n)
10 loops, best of 3: 39.6 ms per loop
>>> %timeit func2(lis, n)
10 loops, best of 3: 41.5 ms per loop

>>> n = (10**6)-1000
>>> %timeit func(lis, n)
100 loops, best of 3: 18.9 ms per loop
>>> %timeit func1(lis, n)
100 loops, best of 3: 18.8 ms per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 50.9 us per loop    #clear winner for large index
>>> %timeit func1(lis, n)

For Small lists normal slice is faster than islice for almost all cases.

>>> lis = range(1000)
>>> n = 100
>>> %timeit func(lis, n)
10000 loops, best of 3: 60.7 us per loop
>>> %timeit func1(lis, n)
10000 loops, best of 3: 59.6 us per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 59.9 us per loop

>>> n = 500
>>> %timeit func(lis, n)
10000 loops, best of 3: 38.4 us per loop
>>> %timeit func1(lis, n)
10000 loops, best of 3: 33.9 us per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 26.6 us per loop

>>> n = 900
>>> %timeit func(lis, n)
10000 loops, best of 3: 20.1 us per loop
>>> %timeit func1(lis, n)
10000 loops, best of 3: 17.2 us per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 11.3 us per loop

Conclusion:

Go for normal slices.

like image 34
Ashwini Chaudhary Avatar answered Oct 11 '22 07:10

Ashwini Chaudhary