often when working with numpy I find the distinction annoying - when I pull out a vector or a row from a matrix and then perform operations with np.array
s there are usually problems.
to reduce headaches, I've taken to sometimes just using np.matrix
(converting all np.arrays to np.matrix
) just for simplicity. however, I suspect there are some performance implications. could anyone comment as to what those might be and the reasons why?
it seems like if they are both just arrays underneath the hood that element access is simply an offset calculation to get the value, so I'm not sure without reading through the entire source what the difference might be.
more specifically, what performance implications does this have:
v = np.matrix([1, 2, 3, 4])
# versus the below
w = np.array([1, 2, 3, 4])
thanks
I added some more tests, and it appears that an array
is considerably faster than matrix
when array/matrices are small, but the difference gets smaller for larger data structures:
Small (4x4):
In [11]: a = [[1,2,3,4],[5,6,7,8]]
In [12]: aa = np.array(a)
In [13]: ma = np.matrix(a)
In [14]: %timeit aa.sum()
1000000 loops, best of 3: 1.77 us per loop
In [15]: %timeit ma.sum()
100000 loops, best of 3: 15.1 us per loop
In [16]: %timeit np.dot(aa, aa.T)
1000000 loops, best of 3: 1.72 us per loop
In [17]: %timeit ma * ma.T
100000 loops, best of 3: 7.46 us per loop
Larger (100x100):
In [19]: aa = np.arange(10000).reshape(100,100)
In [20]: ma = np.matrix(aa)
In [21]: %timeit aa.sum()
100000 loops, best of 3: 9.18 us per loop
In [22]: %timeit ma.sum()
10000 loops, best of 3: 22.9 us per loop
In [23]: %timeit np.dot(aa, aa.T)
1000 loops, best of 3: 1.26 ms per loop
In [24]: %timeit ma * ma.T
1000 loops, best of 3: 1.24 ms per loop
Notice that matrices are actually slightly faster for multiplication.
I believe that what I am getting here is consistent with what @Jaime is explaining the comment.
There is a general discusion on SciPy.org and on this question.
To compare performance, I did the following in iPython. It turns out that arrays are significantly faster.
In [1]: import numpy as np
In [2]: %%timeit
...: v = np.matrix([1, 2, 3, 4])
100000 loops, best of 3: 16.9 us per loop
In [3]: %%timeit
...: w = np.array([1, 2, 3, 4])
100000 loops, best of 3: 7.54 us per loop
Therefore numpy arrays seem to have faster performance than numpy matrices.
Versions used:
Numpy: 1.7.1
IPython: 0.13.2
Python: 2.7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With