Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy performance gap between len(arr) and arr.shape[0]

I've found that len(arr) is almost twice as fast as arr.shape[0] and am wondering why.

I am using Python 3.5.2, Numpy 1.14.2, IPython 6.3.1

The below code demonstrates this:

arr = np.random.randint(1, 11, size=(3, 4, 5))

%timeit len(arr)
# 62.6 ns ± 0.239 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit arr.shape[0]
# 102 ns ± 0.163 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

I've also done some more tests for comparison:

class Foo():
    def __init__(self):
        self.shape = (3, 4, 5)        

foo = Foo()

%timeit arr.shape
# 75.6 ns ± 0.107 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit foo.shape
# 61.2 ns ± 0.281 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit foo.shape[0]
# 78.6 ns ± 1.03 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

So I have two questions:

1) Why does len(arr) works faster than arr.shape[0]? (I would have thought len would be slower because of the function call)

2) Why does foo.shape[0] work faster than arr.shape[0]? (In other words, what overhead does do numpy arrays incur in this case?)

like image 280
MrPisarik Avatar asked Jul 13 '18 19:07

MrPisarik


People also ask

What makes NumPy faster?

NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.

Is looping through a NumPy array faster?

The following code multiplies each element of an array with a corresponding element in another array. Finally, we sum up all the individual products. Once again, the NumPy version was about 100 times faster than iterating over a list.

Can you use LEN in NumPy array?

You can use the len() method for NumPy arrays, but NumPy also has the built-in typecode . size that you can use to calculate length. Both outputs return 8, the number of elements in the array.

Are NumPy arrays more memory efficient?

NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.


1 Answers

The numpy array data structure is implemented in C. The dimensions of the array are stored in a C structure. They are not stored in a Python tuple. So each time you read the shape attribute, a new Python tuple of new Python integer objects is created. When you use arr.shape[0], that tuple is then indexed to pull out the first element, which adds a little more overhead. len(arr) only has to create a Python integer.

You can easily verify that arr.shape creates a new tuple each time it is read:

In [126]: arr = np.random.randint(1, 11, size=(3, 4, 5))

In [127]: s1 = arr.shape

In [128]: id(s1)
Out[128]: 4916019848

In [129]: s2 = arr.shape

In [130]: id(s2)
Out[130]: 4909905024

s1 and s2 have different ids; they are different tuple objects.

like image 129
Warren Weckesser Avatar answered Oct 04 '22 02:10

Warren Weckesser