Note: I'm a Ruby developer trying to find my way in Python. When I wanted to figure out why some scripts use <code>mylist[:]</code> instead of <code>list(mylist)</code> to duplicate lists, I made a quick benchmark of the various methods to duplicate <code>range(10)</code> (see code below). EDIT: I updated the tests to make use of Python's <code>timeit</code> as suggested below. This makes it impossible to directly compare it to Ruby, because timeit doesn't account for the looping while Ruby's <code>Benchmark</code> does, so Ruby code is for reference only. Python 2.7.2 <pre class="prettyprint"><code>Array duplicating. Tests run 50000000 times list(a) 18.7599430084 copy(a) 59.1787488461 a[:] 9.58828091621 a[0:len(a)] 14.9832749367 </code></pre> For reference, I wrote the same script in Ruby too: Ruby 1.9.2p0 <pre class="prettyprint"><code>Array duplicating. Tests 50000000 times user system total real Array.new(a) 14.590000 0.030000 14.620000 ( 14.693033) Array[*a] 18.840000 0.060000 18.900000 ( 19.156352) a.take(a.size) 8.780000 0.020000 8.800000 ( 8.805700) a.clone 16.310000 0.040000 16.350000 ( 16.384711) a[0,a.size] 8.950000 0.020000 8.970000 ( 8.990514) </code></pre> Question 1: what is <code>mylist[:]</code> doing differently that it is 25 % faster than even <code>mylist[0:len(mylist)]</code>. Does it copy in memory directly or what? Question 2: edit: updated benchmarks don't show huge differences in Python and Ruby anymore. was: Did I implement the tests in some obviously inefficient way, so that Ruby code is so much faster than Python? Now the code listings: Python: <pre class="prettyprint"><code>import timeit COUNT = 50000000 print "Array duplicating. Tests run", COUNT, "times" setup = 'a = range(10); import copy' print "list(a)\t\t", timeit.timeit(stmt='list(a)', setup=setup, number=COUNT) print "copy(a)\t\t", timeit.timeit(stmt='copy.copy(a)', setup=setup, number=COUNT) print "a[:]\t\t", timeit.timeit(stmt='a[:]', setup=setup, number=COUNT) print "a[0:len(a)]\t", timeit.timeit(stmt='a[0:len(a)]', setup=setup, number=COUNT) </code></pre> Ruby: <pre class="prettyprint"><code>require 'benchmark' a = (0...10).to_a COUNT = 50_000_000 puts "Array duplicating. Tests #{COUNT} times" Benchmark.bm(16) do |x| x.report("Array.new(a)") {COUNT.times{ Array.new(a) }} x.report("Array[*a]") {COUNT.times{ Array[*a] }} x.report("a.take(a.size)") {COUNT.times{ a.take(a.size) }} x.report("a.clone") {COUNT.times{ a.clone }} x.report("a[0,a.size]"){COUNT.times{ a[0,a.size] }} end </code></pre>

Use the <code>timeit</code> module in python for testing timings. <pre class="prettyprint"><code>from copy import * a=range(1000) def cop(): b=copy(a) def func1(): b=list(a) def slice(): b=a[:] def slice_len(): b=a[0:len(a)] if __name__=="__main__": import timeit print "copy(a)",timeit.timeit("cop()", setup="from __main__ import cop") print "list(a)",timeit.timeit("func1()", setup="from __main__ import func1") print "a[:]",timeit.timeit("slice()", setup="from __main__ import slice") print "a[0:len(a)]",timeit.timeit("slice_len()", setup="from __main__ import slice_len") </code></pre> Results: <pre class="prettyprint"><code>copy(a) 3.98940896988 list(a) 2.54542589188 a[:] 1.96630120277 #winner a[0:len(a)] 10.5431251526 </code></pre> It's surely the extra steps involved in <code>a[0:len(a)]</code> are the reason for it's slowness. Here's the byte code comparison of the two: <pre class="prettyprint"><code>In [19]: dis.dis(func1) 2 0 LOAD_GLOBAL 0 (range) 3 LOAD_CONST 1 (100000) 6 CALL_FUNCTION 1 9 STORE_FAST 0 (a) 3 12 LOAD_FAST 0 (a) 15 SLICE+0 16 STORE_FAST 1 (b) 19 LOAD_CONST 0 (None) 22 RETURN_VALUE In [20]: dis.dis(func2) 2 0 LOAD_GLOBAL 0 (range) 3 LOAD_CONST 1 (100000) 6 CALL_FUNCTION 1 9 STORE_FAST 0 (a) 3 12 LOAD_FAST 0 (a) #same up to here 15 LOAD_CONST 2 (0) #loads 0 18 LOAD_GLOBAL 1 (len) # loads the builtin len(), # so it might take some lookup time 21 LOAD_FAST 0 (a) 24 CALL_FUNCTION 1 27 SLICE+3 28 STORE_FAST 1 (b) 31 LOAD_CONST 0 (None) 34 RETURN_VALUE </code></pre>

I can't comment on the ruby timing vs. the python timing. But I can comment on <code>list</code> vs. <code>slice</code>. Here's a quick inspection of the bytecode: <pre class="prettyprint"><code>>>> import dis >>> a = range(10) >>> def func(a): ... return a[:] ... >>> def func2(a): ... return list(a) ... >>> dis.dis(func) 2 0 LOAD_FAST 0 (a) 3 SLICE+0 4 RETURN_VALUE >>> dis.dis(func2) 2 0 LOAD_GLOBAL 0 (list) 3 LOAD_FAST 0 (a) 6 CALL_FUNCTION 1 9 RETURN_VALUE </code></pre> Notice that <code>list</code> requires a <code>LOAD_GLOBAL</code> to find the function <code>list</code>. Looking up globals (and calling functions) in python is relatively slow. This would explain why <code>a[0:len(a)]</code> is also slower. Also remember that <code>list</code> needs to be able to handle arbitrary iterators whereas slicing doesn't. This means that <code>list</code> needs to allocate a new list, pack elements into that list as it iterates over the list and resize when necessary. There are a few things in here which are expensive -- resizing if necessary and iterating (effectively in python, not C). With the slicing method, you can calculate the size of the memory you'll need so can probably avoid resizing, and the iteration can be done completely in C (probably with a <code>memcpy</code> or something. disclaimer : I'm not a python dev, so I don't know how the internals of <code>list()</code> are implemented for sure. I'm just speculating based what I know of the specification. EDIT -- So I've looked at the source (with a little guidance from Martijn). The relevant code is in <code>listobject.c</code>. <code>list</code> calls <code>list_init</code> which then calls <code>listextend</code> at line 799. That function has some checks to see if it can use a fast branch if the object is a list or a tuple (line 812). Finally, the heavy lifting is done starting at line 834: <pre class="prettyprint"><code> src = PySequence_Fast_ITEMS(b); dest = self->ob_item + m; for (i = 0; i < n; i++) { PyObject *o = src[i]; Py_INCREF(o); dest[i] = o; } </code></pre> Compare that to the slice version which I think is defined in <code>list_subscript</code> (line 2544). That calls <code>list_slice</code> (line 2570) where the heavy lifting is done by the following loop (line 486): <pre class="prettyprint"><code> src = a->ob_item + ilow; dest = np->ob_item; for (i = 0; i < len; i++) { PyObject *v = src[i]; Py_INCREF(v); dest[i] = v; } </code></pre> They're pretty much the same code, so it's not surprising that the performance is almost the same for large lists (where the overhead of the small stuff like unpacking slices, looking up global variables, etc becomes less important) <hr> Here's how I would run the python tests (and the results for my Ubuntu system): <pre class="prettyprint"><code>$ python -m timeit -s 'a=range(30)' 'list(a)' 1000000 loops, best of 3: 0.39 usec per loop $ python -m timeit -s 'a=range(30)' 'a[:]' 10000000 loops, best of 3: 0.183 usec per loop $ python -m timeit -s 'a=range(30)' 'a[0:len(a)]' 1000000 loops, best of 3: 0.254 usec per loop </code></pre>

Efficient ways to duplicate array/list in Python

Tags:

python

arrays

ruby

Note: I'm a Ruby developer trying to find my way in Python.

When I wanted to figure out why some scripts use mylist[:] instead of list(mylist) to duplicate lists, I made a quick benchmark of the various methods to duplicate range(10) (see code below).

EDIT: I updated the tests to make use of Python's timeit as suggested below. This makes it impossible to directly compare it to Ruby, because timeit doesn't account for the looping while Ruby's Benchmark does, so Ruby code is for reference only.

Python 2.7.2

Array duplicating. Tests run 50000000 times
list(a)     18.7599430084
copy(a)     59.1787488461
a[:]         9.58828091621
a[0:len(a)] 14.9832749367

For reference, I wrote the same script in Ruby too:

Ruby 1.9.2p0

Array duplicating. Tests 50000000 times
                      user     system      total        real
Array.new(a)     14.590000   0.030000  14.620000 ( 14.693033)
Array[*a]        18.840000   0.060000  18.900000 ( 19.156352)
a.take(a.size)    8.780000   0.020000   8.800000 (  8.805700)
a.clone          16.310000   0.040000  16.350000 ( 16.384711)
a[0,a.size]       8.950000   0.020000   8.970000 (  8.990514)

Question 1: what is mylist[:] doing differently that it is 25 % faster than even mylist[0:len(mylist)]. Does it copy in memory directly or what?

Question 2: edit: updated benchmarks don't show huge differences in Python and Ruby anymore. was: Did I implement the tests in some obviously inefficient way, so that Ruby code is so much faster than Python?

Now the code listings:

Python:

import timeit

COUNT = 50000000

print "Array duplicating. Tests run", COUNT, "times"

setup = 'a = range(10); import copy'

print "list(a)\t\t", timeit.timeit(stmt='list(a)', setup=setup, number=COUNT)
print "copy(a)\t\t", timeit.timeit(stmt='copy.copy(a)', setup=setup, number=COUNT)
print "a[:]\t\t", timeit.timeit(stmt='a[:]', setup=setup, number=COUNT)
print "a[0:len(a)]\t", timeit.timeit(stmt='a[0:len(a)]', setup=setup, number=COUNT)

Ruby:

require 'benchmark'

a = (0...10).to_a

COUNT = 50_000_000

puts "Array duplicating. Tests #{COUNT} times"

Benchmark.bm(16) do |x|
  x.report("Array.new(a)")   {COUNT.times{ Array.new(a) }}
  x.report("Array[*a]")   {COUNT.times{ Array[*a] }}
  x.report("a.take(a.size)")   {COUNT.times{ a.take(a.size) }}
  x.report("a.clone")    {COUNT.times{ a.clone }}
  x.report("a[0,a.size]"){COUNT.times{ a[0,a.size] }}
end

780

asked Oct 24 '12 11:10

Laas

2 Answers

Use the timeit module in python for testing timings.

from copy import *

a=range(1000)

def cop():
    b=copy(a)

def func1():
    b=list(a)

def slice():
    b=a[:]

def slice_len():
    b=a[0:len(a)]



if __name__=="__main__":
    import timeit
    print "copy(a)",timeit.timeit("cop()", setup="from __main__ import cop")
    print "list(a)",timeit.timeit("func1()", setup="from __main__ import func1")
    print "a[:]",timeit.timeit("slice()", setup="from __main__ import slice")
    print "a[0:len(a)]",timeit.timeit("slice_len()", setup="from __main__ import slice_len")

Results:

copy(a) 3.98940896988
list(a) 2.54542589188
a[:] 1.96630120277                   #winner
a[0:len(a)] 10.5431251526

It's surely the extra steps involved in a[0:len(a)] are the reason for it's slowness.

Here's the byte code comparison of the two:

In [19]: dis.dis(func1)
  2           0 LOAD_GLOBAL              0 (range)
              3 LOAD_CONST               1 (100000)
              6 CALL_FUNCTION            1
              9 STORE_FAST               0 (a)

  3          12 LOAD_FAST                0 (a)
             15 SLICE+0             
             16 STORE_FAST               1 (b)
             19 LOAD_CONST               0 (None)
             22 RETURN_VALUE        

In [20]: dis.dis(func2)
  2           0 LOAD_GLOBAL              0 (range)
              3 LOAD_CONST               1 (100000)
              6 CALL_FUNCTION            1
              9 STORE_FAST               0 (a)

  3          12 LOAD_FAST                0 (a)    #same up to here
             15 LOAD_CONST               2 (0)    #loads 0
             18 LOAD_GLOBAL              1 (len) # loads the builtin len(),
                                                 # so it might take some lookup time
             21 LOAD_FAST                0 (a)
             24 CALL_FUNCTION            1         
             27 SLICE+3             
             28 STORE_FAST               1 (b)
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE

121

answered Oct 20 '22 15:10

Ashwini Chaudhary

I can't comment on the ruby timing vs. the python timing. But I can comment on list vs. slice. Here's a quick inspection of the bytecode:

>>> import dis
>>> a = range(10)
>>> def func(a):
...     return a[:]
... 
>>> def func2(a):
...     return list(a)
... 
>>> dis.dis(func)
  2           0 LOAD_FAST                0 (a)
              3 SLICE+0             
              4 RETURN_VALUE        
>>> dis.dis(func2)
  2           0 LOAD_GLOBAL              0 (list)
              3 LOAD_FAST                0 (a)
              6 CALL_FUNCTION            1
              9 RETURN_VALUE

Notice that list requires a LOAD_GLOBAL to find the function list. Looking up globals (and calling functions) in python is relatively slow. This would explain why a[0:len(a)] is also slower. Also remember that list needs to be able to handle arbitrary iterators whereas slicing doesn't. This means that list needs to allocate a new list, pack elements into that list as it iterates over the list and resize when necessary. There are a few things in here which are expensive -- resizing if necessary and iterating (effectively in python, not C). With the slicing method, you can calculate the size of the memory you'll need so can probably avoid resizing, and the iteration can be done completely in C (probably with a memcpy or something.

disclaimer : I'm not a python dev, so I don't know how the internals of list() are implemented for sure. I'm just speculating based what I know of the specification.

EDIT -- So I've looked at the source (with a little guidance from Martijn). The relevant code is in listobject.c. list calls list_init which then calls listextend at line 799. That function has some checks to see if it can use a fast branch if the object is a list or a tuple (line 812). Finally, the heavy lifting is done starting at line 834:

 src = PySequence_Fast_ITEMS(b);
 dest = self->ob_item + m;
 for (i = 0; i < n; i++) {
     PyObject *o = src[i];
     Py_INCREF(o);
     dest[i] = o;
 }

Compare that to the slice version which I think is defined in list_subscript (line 2544). That calls list_slice (line 2570) where the heavy lifting is done by the following loop (line 486):

 src = a->ob_item + ilow;
 dest = np->ob_item;
 for (i = 0; i < len; i++) {
     PyObject *v = src[i];
     Py_INCREF(v);
     dest[i] = v;
 }

They're pretty much the same code, so it's not surprising that the performance is almost the same for large lists (where the overhead of the small stuff like unpacking slices, looking up global variables, etc becomes less important)

Here's how I would run the python tests (and the results for my Ubuntu system):

$ python -m timeit -s 'a=range(30)' 'list(a)'
1000000 loops, best of 3: 0.39 usec per loop
$ python -m timeit -s 'a=range(30)' 'a[:]'
10000000 loops, best of 3: 0.183 usec per loop
$ python -m timeit -s 'a=range(30)' 'a[0:len(a)]'
1000000 loops, best of 3: 0.254 usec per loop

answered Oct 20 '22 16:10

mgilson

Related questions
                            
                                Why can't `virtualenv` find `pkg_resources`?
                            
                                Can't connect to MongoDB 2.0.5 database with pymongo 2.2
                            
                                Python: shortcut for writing decorators which accept arguments?
                            
                                Ubuntu AMI not loading user-data
                            
                                vim settings for python
                            
                                How to load a pyYAML file and access it using attributes instead of using the dictionary notation?
                            
                                Processing SQS queues with boto
                            
                                why is the construct x = (Condition and A or B) used?
                            
                                Automatic insertion of a colon after 'def', 'if' etc
                            
                                Where is the source code for PyPI, the Python package index? [closed]
                            
                                Python Gtk.Entry placeholder text
                            
                                Flask: How to get url for dynamically generated image file?
                            
                                How to deal with time values over 24 hours in python?
                            
                                How to set up a resource shared by several unit tests?
                            
                                Is it possible to generate correct PKCS12 (.pfx) file in Python?
                            
                                Frequency of global variables in python?
                            
                                Why is lambda asking for 2 arguments despite being given 2 arguments?
                            
                                WSGIPythonPath is not working
                            
                                Sphinx LaTeX markup limitations
                            
                                How to properly quit a program in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With