How efficient are iterations over slice operations in Python? And if a copy is inevitable with slices, is there an alternative?
I know that a slice operation over a list is O(k), where k is the size of the slice.
x[5 : 5+k] # O(k) copy operation
However, when iterating over a part of a list, I find that the cleanest (and most Pythonic?) way to do this (without having to resort to indices) is to do:
for elem in x[5 : 5+k]:
print elem
However my intuition is that this still results in an expensive copy of the sublist, rather than simply iterating over the existing list.
Given this practice, you can now write a for loop to yield slice1 , slice2 , and slice3 in a row. We did it by defining a variable gap . There is always a gap of 5 between the starting index and the ending index because the slices we want are of length 5.
Python is a zero-indexed language (things start counting from zero), and is also left inclusive, right exclusive you are when specifying a range of values. This applies to objects like lists and Series , where the first element has a position (index) of 0.
The slice syntax is a handy way to refer to sub-parts of sequences – typically strings and lists. The slice s[start:end] is the elements beginning at start and extending up to but not including end.
Python slice() Function The slice() function returns a slice object. A slice object is used to specify how to slice a sequence. You can specify where to start the slicing, and where to end. You can also specify the step, which allows you to e.g. slice only every other item.
Use:
for elem in x[5 : 5+k]:
It's Pythonic! Don't change this until you've profiled your code and determined that this is a bottleneck -- though I doubt you will ever find this to be the main source of a bottleneck.
In terms of speed it will probably be your best choice:
In [30]: x = range(100)
In [31]: k = 90
In [32]: %timeit x[5:5+k]
1000000 loops, best of 3: 357 ns per loop
In [35]: %timeit list(IT.islice(x, 5, 5+k))
100000 loops, best of 3: 2.42 us per loop
In [36]: %timeit [x[i] for i in xrange(5, 5+k)]
100000 loops, best of 3: 5.71 us per loop
In terms of memory, it is not as bad you might think. x[5: 5+k]
is a shallow copy of part of x
. So even if the objects in x
are large, x[5: 5+k]
is creating a new list with k elements which reference the same objects as in x
. So you only need extra memory to create a list with k references to pre-existing objects. That probably is not going to be the source of any memory problems.
You can use itertools.islice
to get a sliced iterator from the list:
Example:
>>> from itertools import islice
>>> lis = range(20)
>>> for x in islice(lis, 10, None, 1):
... print x
...
10
11
12
13
14
15
16
17
18
19
As noted by @user2357112 the performance of islice
depends on the start point of slice and the size of the iterable, normal slice is going to be fast in almost all cases and should be preferred. Here are some more timing comparisons:
For Huge lists islice
is slightly faster or equal to normal slice when the slice's start point is less than half the size of list, for bigger indexes normal slice is the clear winner.
>>> def func(lis, n):
it = iter(lis)
for x in islice(it, n, None, 1):pass
...
>>> def func1(lis, n):
#it = iter(lis)
for x in islice(lis, n, None, 1):pass
...
>>> def func2(lis, n):
for x in lis[n:]:pass
...
>>> lis = range(10**6)
>>> n = 100
>>> %timeit func(lis, n)
10 loops, best of 3: 62.1 ms per loop
>>> %timeit func1(lis, n)
1 loops, best of 3: 60.8 ms per loop
>>> %timeit func2(lis, n)
1 loops, best of 3: 82.8 ms per loop
>>> n = 1000
>>> %timeit func(lis, n)
10 loops, best of 3: 64.4 ms per loop
>>> %timeit func1(lis, n)
1 loops, best of 3: 60.3 ms per loop
>>> %timeit func2(lis, n)
1 loops, best of 3: 85.8 ms per loop
>>> n = 10**4
>>> %timeit func(lis, n)
10 loops, best of 3: 61.4 ms per loop
>>> %timeit func1(lis, n)
10 loops, best of 3: 61 ms per loop
>>> %timeit func2(lis, n)
1 loops, best of 3: 80.8 ms per loop
>>> n = (10**6)/2
>>> %timeit func(lis, n)
10 loops, best of 3: 39.2 ms per loop
>>> %timeit func1(lis, n)
10 loops, best of 3: 39.6 ms per loop
>>> %timeit func2(lis, n)
10 loops, best of 3: 41.5 ms per loop
>>> n = (10**6)-1000
>>> %timeit func(lis, n)
100 loops, best of 3: 18.9 ms per loop
>>> %timeit func1(lis, n)
100 loops, best of 3: 18.8 ms per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 50.9 us per loop #clear winner for large index
>>> %timeit func1(lis, n)
For Small lists normal slice is faster than islice
for almost all cases.
>>> lis = range(1000)
>>> n = 100
>>> %timeit func(lis, n)
10000 loops, best of 3: 60.7 us per loop
>>> %timeit func1(lis, n)
10000 loops, best of 3: 59.6 us per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 59.9 us per loop
>>> n = 500
>>> %timeit func(lis, n)
10000 loops, best of 3: 38.4 us per loop
>>> %timeit func1(lis, n)
10000 loops, best of 3: 33.9 us per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 26.6 us per loop
>>> n = 900
>>> %timeit func(lis, n)
10000 loops, best of 3: 20.1 us per loop
>>> %timeit func1(lis, n)
10000 loops, best of 3: 17.2 us per loop
>>> %timeit func2(lis, n)
10000 loops, best of 3: 11.3 us per loop
Go for normal slices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With