Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3.5: slice vs islice vs alternatives? Efficiency comparison

Tags:

python

list

slice

Context

This is a general question regarding efficiency. I have a list and I need a consecutive run / sublist from a list. Typically, this is done via a slice:

my_list[start:end]

however, slice generates a copy of the original list (at least references of the original list). Thus it is possible that this operation is slower than if it did not do this.

islice is an alternative which makes an iterator instead. Since I only care about having all the values at one, not iterating over them, I will have to type cast:

list(islice(my_list, start, end))

Background Work

To do some comparisons I randomly sliced/isliced 10 times on lists of increasing size from 1 to 10,000:

is_vals = []
s_vals = []
for l in range(1, 10000):

    my_list = [random.random() for k in range(l)]

    for p in range(10):
        i = random.randint(0, l)
        j = random.randint(0, l)

        if i < j:

            start_time = time.clock()
            list(islice(my_list, i, j))
            is_vals.append(time.clock() - start_time)
            start_time = time.clock()
            my_list[i:j]
            s_vals.append(time.clock() - start_time)

        else:
            start_time = time.clock()
            list(islice(my_list, j, i))
            is_vals.append(time.clock() - start_time)
            start_time = time.clock()
            my_list[j:i]
            s_vals.append(time.clock() - start_time)

print(statistics.mean(is_vals) - statistics.mean(s_vals))

what I found is that slice is still faster, with the difference between islice and slice being 2.99e-05.

I am not sure, but I will go ahead and chalk that up to typecasting the iterator object.

Question

is there a more efficient way than slice to get a consecutive run / sublist in a list?

Bonus: is there a way to more or less typecast a list / tuple into a slice? e.g. turn [i,j] into i:j?

like image 924
SumNeuron Avatar asked Dec 23 '22 22:12

SumNeuron


2 Answers

You can't beat mylist[start:stop] in speed, no. Not if you want a new list object containing the same elements from a contiguous area of the input list.

That's because the list type implementation has direct access to the internal storage for a list object. You can't get access to those elements any faster from outside.

Only use iterators when memory efficiency is important. Iterators add a iteration speed overhead, they generally are not faster. In this case, the expression list(islice(my_list, start, stop)) will do the following work:

  1. Create a list iterator object for my_list; this will yield elements from my_list as you iterate over it.
  2. create a new islice() iterator object; this will skip start elements from the list iterator, then produce values until you reach the stop index.
  3. produce an iterator from the islice() iterator object. In this case that will just re-use the same object, but this is still a separate (C) function call.
  4. produce a new list object from all elements that the iterator object produced in step 3 yields.

The mylist[start:stop] call on the other hand only does this:

  1. Call mylist.__getitem__(slice(start, stop)). This method directly produces a new list object with the same elements copied form its internal array directly to the new list object array.
like image 196
Martijn Pieters Avatar answered Jan 29 '23 10:01

Martijn Pieters


import random
import time
from itertools import islice
import statistics

l = 1000000
is_vals, s_vals = [], []
my_list = [random.random() for _ in range(l)]
for p in range(10):
    i = random.randint(0, l//3)
    j = random.randint(l-l//3, l)

    start_time = time.clock()
    sum1 = 0
    for k in islice(my_list, i, j):
        sum1 += k
    is_vals.append(time.clock() - start_time)
    start_time = time.clock()
    sum2 = 0
    for k in my_list[i:j]:
        sum2 += k
    s_vals.append(time.clock() - start_time)
    assert sum1 == sum2

print(is_vals)
print(s_vals)
print(statistics.mean(is_vals)-statistics.mean(s_vals))

This shows islice is slightly faster than slice. This is because Python interpreter creates a new list (my_list[i:j]) and then iterates over it in the line

for k in my_list[i:j]:

whereas in the line

for k in islice(my_list, i, j):

it does not create a new list and directly iterates over my_list from ith to jth indices. However, when you write

list(islice(my_list, i, j))

the new list is also created, thus you don't see any advantages over slice.

like image 41
Wolfram Avatar answered Jan 29 '23 08:01

Wolfram