This is a general question regarding efficiency. I have a list and I need a consecutive run / sublist from a list. Typically, this is done via a slice:
my_list[start:end]
however, slice generates a copy of the original list (at least references of the original list). Thus it is possible that this operation is slower than if it did not do this.
islice
is an alternative which makes an iterator instead. Since I only care about having all the values at one, not iterating over them, I will have to type cast:
list(islice(my_list, start, end))
To do some comparisons I randomly sliced/isliced 10 times on lists of increasing size from 1 to 10,000:
is_vals = []
s_vals = []
for l in range(1, 10000):
my_list = [random.random() for k in range(l)]
for p in range(10):
i = random.randint(0, l)
j = random.randint(0, l)
if i < j:
start_time = time.clock()
list(islice(my_list, i, j))
is_vals.append(time.clock() - start_time)
start_time = time.clock()
my_list[i:j]
s_vals.append(time.clock() - start_time)
else:
start_time = time.clock()
list(islice(my_list, j, i))
is_vals.append(time.clock() - start_time)
start_time = time.clock()
my_list[j:i]
s_vals.append(time.clock() - start_time)
print(statistics.mean(is_vals) - statistics.mean(s_vals))
what I found is that slice is still faster, with the difference between islice and slice being 2.99e-05.
I am not sure, but I will go ahead and chalk that up to typecasting the iterator object.
is there a more efficient way than slice to get a consecutive run / sublist in a list?
Bonus: is there a way to more or less typecast a list / tuple into a slice? e.g. turn [i,j] into i:j?
You can't beat mylist[start:stop]
in speed, no. Not if you want a new list object containing the same elements from a contiguous area of the input list.
That's because the list
type implementation has direct access to the internal storage for a list object. You can't get access to those elements any faster from outside.
Only use iterators when memory efficiency is important. Iterators add a iteration speed overhead, they generally are not faster. In this case, the expression list(islice(my_list, start, stop))
will do the following work:
my_list
; this will yield elements from my_list
as you iterate over it.islice()
iterator object; this will skip start
elements from the list iterator, then produce values until you reach the stop
index.islice()
iterator object. In this case that will just re-use the same object, but this is still a separate (C) function call.The mylist[start:stop]
call on the other hand only does this:
mylist.__getitem__(slice(start, stop))
. This method directly produces a new list object with the same elements copied form its internal array directly to the new list object array.import random
import time
from itertools import islice
import statistics
l = 1000000
is_vals, s_vals = [], []
my_list = [random.random() for _ in range(l)]
for p in range(10):
i = random.randint(0, l//3)
j = random.randint(l-l//3, l)
start_time = time.clock()
sum1 = 0
for k in islice(my_list, i, j):
sum1 += k
is_vals.append(time.clock() - start_time)
start_time = time.clock()
sum2 = 0
for k in my_list[i:j]:
sum2 += k
s_vals.append(time.clock() - start_time)
assert sum1 == sum2
print(is_vals)
print(s_vals)
print(statistics.mean(is_vals)-statistics.mean(s_vals))
This shows islice is slightly faster than slice. This is because Python interpreter creates a new list (my_list[i:j]) and then iterates over it in the line
for k in my_list[i:j]:
whereas in the line
for k in islice(my_list, i, j):
it does not create a new list and directly iterates over my_list from ith to jth indices. However, when you write
list(islice(my_list, i, j))
the new list is also created, thus you don't see any advantages over slice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With