numpy: column-wise dot product

Given a 2D numpy array, I need to compute the dot product of every column with itself, and store the result in a 1D array. The following works:

In [45]: A = np.array([[1,2,3,4],[5,6,7,8]])

In [46]: np.array([np.dot(A[:,i], A[:,i]) for i in xrange(A.shape[1])])
Out[46]: array([26, 40, 58, 80])

Is there a simple way to avoid the Python loop? The above is hardly the end of the world, but if there's a numpy primitive for this, I'd like to use it.

edit In practice the matrix has many rows and relatively few columns. I am therefore not overly keen on creating temporary arrays larger than O(A.shape[1]). I also can't modify A in place.

Is Matmul faster than dot?

matmul and both outperform np. dot . Also note, as explained in the docs, np.

What is the difference between Matmul and dot?

matmul differs from dot in two important ways. Multiplication by scalars is not allowed. Stacks of matrices are broadcast together as if the matrices were elements.

What does * do in NumPy?

NumPy performs operations element-by-element, so multiplying 2D arrays with * is not a matrix multiplication – it's an element-by-element multiplication. (The @ operator, available since Python 3.5, can be used for conventional matrix multiplication.)

How about:

>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> (A*A).sum(axis=0)
array([26, 40, 58, 80])

EDIT: Hmm, okay, you don't want intermediate large objects. Maybe:

>>> from numpy.core.umath_tests import inner1d
>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> inner1d(A.T, A.T)
array([26, 40, 58, 80])

which seems a little faster anyway. This should do what you want behind the scenes, as A.T is a view (which doesn't make its own copy, IIUC), and inner1d seems to loop the way it needs to.

VERY BELATED UPDATE: Another alternative would be to use np.einsum:

>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> np.einsum('ij,ij->j', A, A)
array([26, 40, 58, 80])
>>> timeit np.einsum('ij,ij->j', A, A)
100000 loops, best of 3: 3.65 us per loop
>>> timeit inner1d(A.T, A.T)
100000 loops, best of 3: 5.02 us per loop
>>> A = np.random.randint(0, 100, (2, 100000))
>>> timeit np.einsum('ij,ij->j', A, A)
1000 loops, best of 3: 363 us per loop
>>> timeit inner1d(A.T, A.T)
1000 loops, best of 3: 848 us per loop
>>> (np.einsum('ij,ij->j', A, A) == inner1d(A.T, A.T)).all()
True

You can compute the square of all elements and sum up column-wise using

np.sum(np.square(A),0);

(I'm not entirely sure about the second parameter of the sum function, which identifies the axis along which to take the sum, and I have no numpy currently installed. Maybe you'll have to experiment :) ...)

EDIT

Looking at DSM's post, it seems that you should use axis=0. Using the square function might be a little more performant than using A*A.

numpy: column-wise dot product

Tags:

python

numpy

dot-product

NPE

People also ask

2 Answers

DSM

MartinStettner

Recent Activity

Donate For Us

numpy: column-wise dot product

Tags:

python

numpy

dot-product

NPE

People also ask

2 Answers

DSM

MartinStettner

Related questions

Recent Activity

Donate For Us