Note: I'm a Ruby developer trying to find my way in Python.
When I wanted to figure out why some scripts use mylist[:]
instead of list(mylist)
to duplicate lists, I made a quick benchmark of the various methods to duplicate range(10)
(see code below).
EDIT: I updated the tests to make use of Python's timeit
as suggested below. This makes it impossible to directly compare it to Ruby, because timeit doesn't account for the looping while Ruby's Benchmark
does, so Ruby code is for reference only.
Python 2.7.2
Array duplicating. Tests run 50000000 times
list(a) 18.7599430084
copy(a) 59.1787488461
a[:] 9.58828091621
a[0:len(a)] 14.9832749367
For reference, I wrote the same script in Ruby too:
Ruby 1.9.2p0
Array duplicating. Tests 50000000 times
user system total real
Array.new(a) 14.590000 0.030000 14.620000 ( 14.693033)
Array[*a] 18.840000 0.060000 18.900000 ( 19.156352)
a.take(a.size) 8.780000 0.020000 8.800000 ( 8.805700)
a.clone 16.310000 0.040000 16.350000 ( 16.384711)
a[0,a.size] 8.950000 0.020000 8.970000 ( 8.990514)
Question 1: what is mylist[:]
doing differently that it is 25 % faster than even mylist[0:len(mylist)]
. Does it copy in memory directly or what?
Question 2: edit: updated benchmarks don't show huge differences in Python and Ruby anymore. was: Did I implement the tests in some obviously inefficient way, so that Ruby code is so much faster than Python?
Now the code listings:
Python:
import timeit
COUNT = 50000000
print "Array duplicating. Tests run", COUNT, "times"
setup = 'a = range(10); import copy'
print "list(a)\t\t", timeit.timeit(stmt='list(a)', setup=setup, number=COUNT)
print "copy(a)\t\t", timeit.timeit(stmt='copy.copy(a)', setup=setup, number=COUNT)
print "a[:]\t\t", timeit.timeit(stmt='a[:]', setup=setup, number=COUNT)
print "a[0:len(a)]\t", timeit.timeit(stmt='a[0:len(a)]', setup=setup, number=COUNT)
Ruby:
require 'benchmark'
a = (0...10).to_a
COUNT = 50_000_000
puts "Array duplicating. Tests #{COUNT} times"
Benchmark.bm(16) do |x|
x.report("Array.new(a)") {COUNT.times{ Array.new(a) }}
x.report("Array[*a]") {COUNT.times{ Array[*a] }}
x.report("a.take(a.size)") {COUNT.times{ a.take(a.size) }}
x.report("a.clone") {COUNT.times{ a.clone }}
x.report("a[0,a.size]"){COUNT.times{ a[0,a.size] }}
end
To make a deep copy, use the deepcopy() function of the copy module. In a deep copy, copies are inserted instead of references to objects, so changing one does not change the other.
deepcopy() Another key function in the copy module is the deepcopy() function. This function creates a completely independent copy of a list or any other compound object in Python.
Use the timeit
module in python for testing timings.
from copy import *
a=range(1000)
def cop():
b=copy(a)
def func1():
b=list(a)
def slice():
b=a[:]
def slice_len():
b=a[0:len(a)]
if __name__=="__main__":
import timeit
print "copy(a)",timeit.timeit("cop()", setup="from __main__ import cop")
print "list(a)",timeit.timeit("func1()", setup="from __main__ import func1")
print "a[:]",timeit.timeit("slice()", setup="from __main__ import slice")
print "a[0:len(a)]",timeit.timeit("slice_len()", setup="from __main__ import slice_len")
Results:
copy(a) 3.98940896988
list(a) 2.54542589188
a[:] 1.96630120277 #winner
a[0:len(a)] 10.5431251526
It's surely the extra steps involved in a[0:len(a)]
are the reason for it's slowness.
Here's the byte code comparison of the two:
In [19]: dis.dis(func1)
2 0 LOAD_GLOBAL 0 (range)
3 LOAD_CONST 1 (100000)
6 CALL_FUNCTION 1
9 STORE_FAST 0 (a)
3 12 LOAD_FAST 0 (a)
15 SLICE+0
16 STORE_FAST 1 (b)
19 LOAD_CONST 0 (None)
22 RETURN_VALUE
In [20]: dis.dis(func2)
2 0 LOAD_GLOBAL 0 (range)
3 LOAD_CONST 1 (100000)
6 CALL_FUNCTION 1
9 STORE_FAST 0 (a)
3 12 LOAD_FAST 0 (a) #same up to here
15 LOAD_CONST 2 (0) #loads 0
18 LOAD_GLOBAL 1 (len) # loads the builtin len(),
# so it might take some lookup time
21 LOAD_FAST 0 (a)
24 CALL_FUNCTION 1
27 SLICE+3
28 STORE_FAST 1 (b)
31 LOAD_CONST 0 (None)
34 RETURN_VALUE
I can't comment on the ruby timing vs. the python timing. But I can comment on list
vs. slice
. Here's a quick inspection of the bytecode:
>>> import dis
>>> a = range(10)
>>> def func(a):
... return a[:]
...
>>> def func2(a):
... return list(a)
...
>>> dis.dis(func)
2 0 LOAD_FAST 0 (a)
3 SLICE+0
4 RETURN_VALUE
>>> dis.dis(func2)
2 0 LOAD_GLOBAL 0 (list)
3 LOAD_FAST 0 (a)
6 CALL_FUNCTION 1
9 RETURN_VALUE
Notice that list
requires a LOAD_GLOBAL
to find the function list
. Looking up globals (and calling functions) in python is relatively slow. This would explain why a[0:len(a)]
is also slower. Also remember that list
needs to be able to handle arbitrary iterators whereas slicing doesn't. This means that list
needs to allocate a new list, pack elements into that list as it iterates over the list and resize when necessary. There are a few things in here which are expensive -- resizing if necessary and iterating (effectively in python, not C). With the slicing method, you can calculate the size of the memory you'll need so can probably avoid resizing, and the iteration can be done completely in C (probably with a memcpy
or something.
disclaimer : I'm not a python dev, so I don't know how the internals of list()
are implemented for sure. I'm just speculating based what I know of the specification.
EDIT -- So I've looked at the source (with a little guidance from Martijn). The relevant code is in listobject.c
. list
calls list_init
which then calls listextend
at line 799. That function has some checks to see if it can use a fast branch if the object is a list or a tuple (line 812). Finally, the heavy lifting is done starting at line 834:
src = PySequence_Fast_ITEMS(b);
dest = self->ob_item + m;
for (i = 0; i < n; i++) {
PyObject *o = src[i];
Py_INCREF(o);
dest[i] = o;
}
Compare that to the slice version which I think is defined in list_subscript
(line 2544). That calls list_slice
(line 2570) where the heavy lifting is done by the following loop (line 486):
src = a->ob_item + ilow;
dest = np->ob_item;
for (i = 0; i < len; i++) {
PyObject *v = src[i];
Py_INCREF(v);
dest[i] = v;
}
They're pretty much the same code, so it's not surprising that the performance is almost the same for large lists (where the overhead of the small stuff like unpacking slices, looking up global variables, etc becomes less important)
Here's how I would run the python tests (and the results for my Ubuntu system):
$ python -m timeit -s 'a=range(30)' 'list(a)'
1000000 loops, best of 3: 0.39 usec per loop
$ python -m timeit -s 'a=range(30)' 'a[:]'
10000000 loops, best of 3: 0.183 usec per loop
$ python -m timeit -s 'a=range(30)' 'a[0:len(a)]'
1000000 loops, best of 3: 0.254 usec per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With