Why is a `for` over a Python list faster than over a Numpy array?

Tags:

So without telling a really long story I was working on some code where I was reading in some data from a binary file and then looping over every single point using a for loop. So I completed the code and it was running ridiculously slow. I was looping over around 60,000 points from around 128 data channels and this was taking a minute or more to process. This was way slower than I ever expected Python to run. So I made the whole thing more efficient by using Numpy but in trying to figure out why the original process ran so slow we were doing some type checking and found that I was looping over Numpy arrays instead of Python lists. OK no major deal to make the inputs to our test setup the same I converted the Numpy arrays to lists before looping. Bang the same slow code that took a minute to run now took 10 seconds. I was floored. The only think I did was change a Numpy array to a Python list I changed it back and it was slow as mud again. I couldn't believe it so I went to get more definitive proof

Click to copy

$ python -m timeit -s "import numpy" "for k in numpy.arange(5000): k+1"
100 loops, best of 3: 5.46 msec per loop

$ python -m timeit "for k in range(5000): k+1"
1000 loops, best of 3: 256 usec per loop

What is going on? I know that Numpy arrays and and Python list are different but why is it so much slower to iterate over every point in an array?

I observed this behavior in both Python 2.6 and 2.7 running Numpy 10.1 I believe.

694

asked Feb 05 '16 19:02

mechsin

1 Answers

We can do a little sleuthing to figure this out:

Click to copy

>>> import numpy as np
>>> a = np.arange(32)
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])
>>> a.data
<read-write buffer for 0x107d01e40, size 256, offset 0 at 0x107d199b0>
>>> id(a.data)
4433424176
>>> id(a[0])
4424950096
>>> id(a[1])
4424950096
>>> for item in a:
...   print id(item)
... 
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120
4424950096
4424950120

So what is going on here? First, I took a look at the memory location of the array's memory buffer. It's at 4433424176. That in itself isn't too illuminating. However, numpy stores it's data as a contiguous C array, so the first element in the numpy array should correspond to the memory address of the array itself, but it doesn't:

Click to copy

>>> id(a[0])
4424950096

and it's a good thing it doesn't because that would break the invariant in python that 2 objects never have the same id during their lifetimes.

So, how does numpy accomplish this? Well, the answer is that numpy has to wrap the returned object with a python type (e.g. numpy.float64 or numpy.int64 in this case) which takes time if you're iterating item-by-item¹. Further proof of this is demonstrated when iterating -- We see that we're alternating between 2 separate IDs while iterating over the array. This means that python's memory allocator and garbage collector are working overtime to create new objects and then free them.

A list doesn't have this memory allocator/garbage collector overhead. The objects in the list already exist as python objects (and they'll still exist after iteration), so neither plays any role in the iteration over a list.

Timing methodology:

Also note, your timings are thrown off a little bit by your assumptions. You were assuming that k + 1 should take about the same amount of time in both cases, but it doesn't. Notice if I repeat your timings without doing any addition:

Click to copy

mgilson$ python -m timeit -s "import numpy" "for k in numpy.arange(5000): k"
1000 loops, best of 3: 233 usec per loop
mgilson$ python -m timeit "for k in range(5000): k"
10000 loops, best of 3: 114 usec per loop

there's only about a factor of 2 difference. Doing the addition however leads to a factor of 5 difference or so:

Click to copy

mgilson$ python -m timeit "for k in range(5000): k+1"
10000 loops, best of 3: 179 usec per loop
mgilson$ python -m timeit -s "import numpy" "for k in numpy.arange(5000): k+1"
1000 loops, best of 3: 786 usec per loop

For fun, lets just do the addition:

Click to copy

$ python -m timeit -s "v = 1" "v + 1"
10000000 loops, best of 3: 0.0261 usec per loop
mgilson$ python -m timeit -s "import numpy; v = numpy.int64(1)" "v + 1"
10000000 loops, best of 3: 0.121 usec per loop

And finally, your timeit also includes list/array construction time which isn't ideal:

Click to copy

mgilson$ python -m timeit -s "v = range(5000)" "for k in v: k"
10000 loops, best of 3: 80.2 usec per loop
mgilson$ python -m timeit -s "import numpy; v = numpy.arange(5000)" "for k in v: k"
1000 loops, best of 3: 237 usec per loop

Notice that numpy actually got further away from the list solution in this case. This shows that iteration really is slower and you might get some speedups if you convert the numpy types to standard python types.

^{¹Note, this doesn't take a lot of time when slicing because that only has to allocate O(1) new objects since numpy returns a view into the original array.}

141

answered Sep 19 '22 01:09

mgilson

Related questions
                            
                                For Pylint, is it possible to have a different pylintrc file for each Eclipse project?
                            
                                Swapping Axes in Pandas
                            
                                Python: How to not print comma in last element in a for loop?
                            
                                global variable inside main function python
                            
                                Python: NameError: free variable 're' referenced before assignment in enclosing scope
                            
                                Selenium / Python - Selecting via css selector
                            
                                Empty list returned from ElementTree findall
                            
                                Redirect print to string list?
                            
                                How to change my django server time
                            
                                Integration of python in C# Application
                            
                                Python built-in sum function vs. for loop performance
                            
                                PyQt5: Keyboard shortcuts w/ QAction
                            
                                How to label and change the scale of Seaborn kdeplot's axes
                            
                                speech recognition python code not working
                            
                                Python HTML Encoding \xc2\xa0
                            
                                Replace all matches using re.findall()
                            
                                Python List object attribute 'append' is read-only
                            
                                Mock open() function used in a class method
                            
                                How to use pyinstaller?
                            
                                Python's json.load(sys.stdin) gets me u'...' instead of double quotes around Strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is a `for` over a Python list faster than over a Numpy array?

Tags:

performance

python

arrays

numpy

mechsin

People also ask

1 Answers

Timing methodology:

mgilson

Recent Activity

Donate For Us