Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is slice assignment faster than `list.insert`?

Tags:

Inspired by this nice answer,

Here's a benchmark:

import timeit

def test1():
    a = [1,2,3]
    a.insert(0,1)

def test2():
    a = [1,2,3]
    a[0:0]=[1]

print (timeit.timeit('test1()','from __main__ import test1'))
print (timeit.timeit('test2()','from __main__ import test2'))

For me, test2 is sligtly faster (~10%). Why is that the case? I would expect it to be slower since:

  1. slice assignment must be able to accept iterables of any length and therefore must be more general.
  2. in slice assignment, we need to create a new list on the right hand side just to get it to work.

Can anybody help me understand this?

(using python 2.7 on OS-X 10.5.8)

like image 805
mgilson Avatar asked Sep 21 '12 20:09

mgilson


People also ask

What is the time complexity of list slicing?

Slicing is just “copy part of the list” so time complexity is the same as copy. O(n+k) is the average case, which includes having to grow or shrink the list to adjust for the number of elements inserted to replace the original slice.

How efficient is slicing python?

shows that slicing the list carries a performance penalty of ~50% compared to just doing a pop of the first element. This penalty seems to plateau after a certain list size.

What is the purpose of list slicing?

List slicing returns a new list from the existing list. If Lst is a list, then the above expression returns the portion of the list from index Initial to index End, at a step size IndexJump.

Does slicing mutate a list?

Slice Assignment. If we recall, lists are mutable objects in python. In other words, they are able to be mutated, or changed. Thus, we can use slice assignment operation to mutate or edit a list in place.


1 Answers

Your first test case has to call the method insert on the list a, whereas all the operations in test2 are handled directly in byte code. Note the CALL_FUNCTION in the disassembly of test1 below. Calling functions is moderately expensive in Python: certainly expensive enough to account for a few percent difference in run time.

>>> import dis
>>> dis.dis(test1)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 BUILD_LIST               3
             12 STORE_FAST               0 (a)

  3          15 LOAD_FAST                0 (a)
             18 LOAD_ATTR                0 (insert)
             21 LOAD_CONST               4 (0)
             24 LOAD_CONST               1 (1)
             27 CALL_FUNCTION            2
             30 POP_TOP             
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE        
>>> dis.dis(test2)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 BUILD_LIST               3
             12 STORE_FAST               0 (a)

  3          15 LOAD_CONST               1 (1)
             18 BUILD_LIST               1
             21 LOAD_FAST                0 (a)
             24 LOAD_CONST               4 (0)
             27 LOAD_CONST               4 (0)
             30 STORE_SLICE+3       
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE        

Bad explanation

I posted this first, but after consideration I think it is not correct. The difference I describe here should only make a noticeable difference when there is a lot of data to be moved, which isn't the case in the test here. And even with a lot of data the difference is only a couple of percent:

import timeit

def test1():
    a = range(10000000)
    a.insert(1,1)

def test2():
    a = range(10000000)
    a[1:1]=[1]

>>> timeit.timeit(test1, number=10)
6.008707046508789
>>> timeit.timeit(test2, number=10)
5.861173868179321

The method list.insert is implemented by the function ins1 in listobject.c. You'll see that it copies the item references for the tail of the list one by one:

for (i = n; --i >= where; )
    items[i+1] = items[i];

On the other hand slice assignment is implemented by the function list_ass_slice, which calls memmove:

memmove(&item[ihigh+d], &item[ihigh],
        (k - ihigh)*sizeof(PyObject *));

So I think the answer to your question is that the C library function memmove is better optimized than the simple loop. See here for the glibc implementation of memmove: I believe that when called from list_ass_slice it eventually ends up calling _wordcopy_bwd_aligned which you can see is heavily hand-optimized.

like image 132
Gareth Rees Avatar answered Dec 08 '22 23:12

Gareth Rees