Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

built-in range or numpy.arange: which is more efficient?

When iterating over a large array with a range expression, should I use Python's built-in range function, or numpy's arange to get the best performance?

My reasoning so far:

range probably resorts to a native implementation and might be faster therefore. On the other hand, arange returns a full array, which occupies memory, so there might be an overhead. Python 3's range expression is a generator, which does not hold all the values in memory.

like image 429
clstaudt Avatar asked May 22 '12 09:05

clstaudt


People also ask

Are NumPy arrays more efficient than lists?

NumPy Arrays Are Faster Than Lists.

What is the difference between Range () and arange () functions in Python?

The main difference between the two is that range is a built-in Python class, while arange() is a function that belongs to a third-party library (NumPy). In addition, their purposes are different! Generally, range is more suitable when you need to iterate using the Python for loop.

What is faster than NumPy?

pandas provides a bunch of C or Cython optimized functions that can be faster than the NumPy equivalent function (e.g. reading text from text files). If you want to do mathematical operations like a dot product, calculating mean, and some more, pandas DataFrames are generally going to be slower than a NumPy array.

Is NP arange the same as range?

The main difference between range and np. arange is that the range() function returns an iterator instead of a list and np. arange() function gives a numpy array that consists of evenly spaced values within a given interval. The range() function generates a sequence of integer values lying between a certain range.


1 Answers

For large arrays, a vectorised numpy operation is the fastest. If you must loop, prefer xrange/range and avoid using np.arange.

In numpy you should use combinations of vectorized calculations, ufuncs and indexing to solve your problems as it runs at C speed. Looping over numpy arrays is inefficient compared to this.

(Something like the worst thing you could do would be to iterate over the array with an index created with range or np.arange as the first sentence in your question suggests, but I'm not sure if you really mean that.)

import numpy as np import sys  sys.version # out: '2.7.3rc2 (default, Mar 22 2012, 04:35:15) \n[GCC 4.6.3]' np.version.version # out: '1.6.2'  size = int(1E6)  %timeit for x in range(size): x ** 2 # out: 10 loops, best of 3: 136 ms per loop  %timeit for x in xrange(size): x ** 2 # out: 10 loops, best of 3: 88.9 ms per loop  # avoid this %timeit for x in np.arange(size): x ** 2 #out: 1 loops, best of 3: 1.16 s per loop  # use this %timeit np.arange(size) ** 2 #out: 100 loops, best of 3: 19.5 ms per loop 

So for this case numpy is 4 times faster than using xrange if you do it right. Depending on your problem numpy can be much faster than a 4 or 5 times speed up.

The answers to this question explain some more advantages of using numpy arrays instead of python lists for large data sets.

like image 168
bmu Avatar answered Sep 23 '22 08:09

bmu