Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

subtracting the mean of each row in numpy with broadcasting

Tags:

python

numpy

I try to subtract the mean of each row of a matrix in numpy using broadcasting but I get an error. Any idea why?

Here is the code:

from numpy import *
X = random.rand(5, 10)
Y = X - X.mean(axis = 1)

Error:

ValueError: operands could not be broadcast together with shapes (5,10) (5,) 

Thanks!

like image 598
Yuval Atzmon Avatar asked Aug 15 '15 23:08

Yuval Atzmon


People also ask

How do you subtract using NumPy?

Subtracting two matrices in NumPy is a pretty common task to perform. The most straightforward way to subtract two matrices in NumPy is by using the - operator, which is the simplification of the np. subtract() method - NumPy specific method designed for subtracting arrays and other array-like objects such as matrices.

How do you find the average of a row in NumPy?

mean() in Python. numpy. mean(arr, axis = None) : Compute the arithmetic mean (average) of the given data (array elements) along the specified axis.

What is broadcasting in NumPy?

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.


2 Answers

The mean method is a reduction operation, meaning it converts a 1-d collection of numbers to a single number. When you apply a reduction to an n-dimensional array along an axis, numpy collapses that dimension to the reduced value, resulting in an (n-1)-dimensional array. In your case, since X has shape (5, 10), and you performed a reduction along axis 1, you end up with an array with shape (5,):

In [8]: m = X.mean(axis=1)

In [9]: m.shape
Out[9]: (5,)

When you try to subtract this result from X, you are trying to subtract an array with shape (5,) from an array with shape (5, 10). These shapes are not compatible for broadcasting. (Take a look at the description of broadcasting in the User Guide.)

For broadcasting to work the way you want, the result of the mean operation should be an array with shape (5, 1) (to be compatible with the shape (5, 10)). In recent versions of numpy, the reduction operations, including mean, have an argument called keepdims that tells the function to not collapse the reduced dimension. Instead, a trivial dimension with length 1 is kept:

In [10]: m = X.mean(axis=1, keepdims=True)

In [11]: m.shape
Out[11]: (5, 1)

With older versions of numpy, you can use reshape to restore the collapsed dimension:

In [12]: m = X.mean(axis=1).reshape(-1, 1)

In [13]: m.shape
Out[13]: (5, 1)

So, depending on your version of numpy, you can do this:

Y = X - X.mean(axis=1, keepdims=True)

or this:

Y = X - X.mean(axis=1).reshape(-1, 1)
like image 159
Warren Weckesser Avatar answered Nov 12 '22 07:11

Warren Weckesser


If you are looking for performance, you can also consider using np.einsum that is supposedly faster than actually using np.sum or np.mean. Thus, the desired output could be obtained like so -

X - np.einsum('ij->i',X)[:,None]/X.shape[1]

Please note that the [:,None] part is similar to keepdims to keep the dimensions of it same as that of the input array. This could also be used in broadcasting.

Runtime tests

1) Comparing just the mean calculation -

In [47]: X = np.random.rand(500, 1000)

In [48]: %timeit X.mean(axis=1, keepdims=True)
1000 loops, best of 3: 1.5 ms per loop

In [49]: %timeit X.mean(axis=1).reshape(-1, 1)
1000 loops, best of 3: 1.52 ms per loop

In [50]: %timeit np.einsum('ij->i',X)[:,None]/X.shape[1]
1000 loops, best of 3: 832 µs per loop

2) Comparing entire calculation -

In [52]: X = np.random.rand(500, 1000)

In [53]: %timeit X - X.mean(axis=1, keepdims=True)
100 loops, best of 3: 6.56 ms per loop

In [54]: %timeit X - X.mean(axis=1).reshape(-1, 1)
100 loops, best of 3: 6.54 ms per loop

In [55]: %timeit X - np.einsum('ij->i',X)[:,None]/X.shape[1]
100 loops, best of 3: 6.18 ms per loop
like image 20
Divakar Avatar answered Nov 12 '22 07:11

Divakar