Inspired by this nice answer,
Here's a benchmark:
import timeit
def test1():
a = [1,2,3]
a.insert(0,1)
def test2():
a = [1,2,3]
a[0:0]=[1]
print (timeit.timeit('test1()','from __main__ import test1'))
print (timeit.timeit('test2()','from __main__ import test2'))
For me, test2
is sligtly faster (~10%). Why is that the case? I would expect it to be slower since:
Can anybody help me understand this?
(using python 2.7 on OS-X 10.5.8)
Slicing is just “copy part of the list” so time complexity is the same as copy. O(n+k) is the average case, which includes having to grow or shrink the list to adjust for the number of elements inserted to replace the original slice.
shows that slicing the list carries a performance penalty of ~50% compared to just doing a pop of the first element. This penalty seems to plateau after a certain list size.
List slicing returns a new list from the existing list. If Lst is a list, then the above expression returns the portion of the list from index Initial to index End, at a step size IndexJump.
Slice Assignment. If we recall, lists are mutable objects in python. In other words, they are able to be mutated, or changed. Thus, we can use slice assignment operation to mutate or edit a list in place.
Your first test case has to call the method insert
on the list a
, whereas all the operations in test2
are handled directly in byte code. Note the CALL_FUNCTION
in the disassembly of test1
below. Calling functions is moderately expensive in Python: certainly expensive enough to account for a few percent difference in run time.
>>> import dis
>>> dis.dis(test1)
2 0 LOAD_CONST 1 (1)
3 LOAD_CONST 2 (2)
6 LOAD_CONST 3 (3)
9 BUILD_LIST 3
12 STORE_FAST 0 (a)
3 15 LOAD_FAST 0 (a)
18 LOAD_ATTR 0 (insert)
21 LOAD_CONST 4 (0)
24 LOAD_CONST 1 (1)
27 CALL_FUNCTION 2
30 POP_TOP
31 LOAD_CONST 0 (None)
34 RETURN_VALUE
>>> dis.dis(test2)
2 0 LOAD_CONST 1 (1)
3 LOAD_CONST 2 (2)
6 LOAD_CONST 3 (3)
9 BUILD_LIST 3
12 STORE_FAST 0 (a)
3 15 LOAD_CONST 1 (1)
18 BUILD_LIST 1
21 LOAD_FAST 0 (a)
24 LOAD_CONST 4 (0)
27 LOAD_CONST 4 (0)
30 STORE_SLICE+3
31 LOAD_CONST 0 (None)
34 RETURN_VALUE
I posted this first, but after consideration I think it is not correct. The difference I describe here should only make a noticeable difference when there is a lot of data to be moved, which isn't the case in the test here. And even with a lot of data the difference is only a couple of percent:
import timeit
def test1():
a = range(10000000)
a.insert(1,1)
def test2():
a = range(10000000)
a[1:1]=[1]
>>> timeit.timeit(test1, number=10)
6.008707046508789
>>> timeit.timeit(test2, number=10)
5.861173868179321
The method list.insert
is implemented by the function ins1
in listobject.c
. You'll see that it copies the item references for the tail of the list one by one:
for (i = n; --i >= where; )
items[i+1] = items[i];
On the other hand slice assignment is implemented by the function list_ass_slice
, which calls memmove
:
memmove(&item[ihigh+d], &item[ihigh],
(k - ihigh)*sizeof(PyObject *));
So I think the answer to your question is that the C library function memmove
is better optimized than the simple loop. See here for the glibc implementation of memmove
: I believe that when called from list_ass_slice
it eventually ends up calling _wordcopy_bwd_aligned
which you can see is heavily hand-optimized.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With