Given a 2D numpy
array, I need to compute the dot product of every column with itself, and store the result in a 1D array. The following works:
In [45]: A = np.array([[1,2,3,4],[5,6,7,8]])
In [46]: np.array([np.dot(A[:,i], A[:,i]) for i in xrange(A.shape[1])])
Out[46]: array([26, 40, 58, 80])
Is there a simple way to avoid the Python loop? The above is hardly the end of the world, but if there's a numpy
primitive for this, I'd like to use it.
edit In practice the matrix has many rows and relatively few columns. I am therefore not overly keen on creating temporary arrays larger than O(A.shape[1])
. I also can't modify A
in place.
matmul and both outperform np. dot . Also note, as explained in the docs, np.
matmul differs from dot in two important ways. Multiplication by scalars is not allowed. Stacks of matrices are broadcast together as if the matrices were elements.
NumPy performs operations element-by-element, so multiplying 2D arrays with * is not a matrix multiplication – it's an element-by-element multiplication. (The @ operator, available since Python 3.5, can be used for conventional matrix multiplication.)
How about:
>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> (A*A).sum(axis=0)
array([26, 40, 58, 80])
EDIT: Hmm, okay, you don't want intermediate large objects. Maybe:
>>> from numpy.core.umath_tests import inner1d
>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> inner1d(A.T, A.T)
array([26, 40, 58, 80])
which seems a little faster anyway. This should do what you want behind the scenes, as A.T is a view (which doesn't make its own copy, IIUC), and inner1d seems to loop the way it needs to.
VERY BELATED UPDATE: Another alternative would be to use np.einsum
:
>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> np.einsum('ij,ij->j', A, A)
array([26, 40, 58, 80])
>>> timeit np.einsum('ij,ij->j', A, A)
100000 loops, best of 3: 3.65 us per loop
>>> timeit inner1d(A.T, A.T)
100000 loops, best of 3: 5.02 us per loop
>>> A = np.random.randint(0, 100, (2, 100000))
>>> timeit np.einsum('ij,ij->j', A, A)
1000 loops, best of 3: 363 us per loop
>>> timeit inner1d(A.T, A.T)
1000 loops, best of 3: 848 us per loop
>>> (np.einsum('ij,ij->j', A, A) == inner1d(A.T, A.T)).all()
True
You can compute the square of all elements and sum up column-wise using
np.sum(np.square(A),0);
(I'm not entirely sure about the second parameter of the sum
function, which identifies the axis along which to take the sum, and I have no numpy currently installed. Maybe you'll have to experiment :) ...)
EDIT
Looking at DSM's post, it seems that you should use axis=0
. Using the square
function might be a little more performant than using A*A
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With