I try to subtract the mean of each row of a matrix in numpy using broadcasting but I get an error. Any idea why? Here is the code: <pre class="prettyprint"><code>from numpy import * X = random.rand(5, 10) Y = X - X.mean(axis = 1) </code></pre> Error: <pre class="prettyprint"><code>ValueError: operands could not be broadcast together with shapes (5,10) (5,) </code></pre> Thanks!

The <code>mean</code> method is a reduction operation, meaning it converts a 1-d collection of numbers to a single number. When you apply a reduction to an n-dimensional array along an axis, numpy collapses that dimension to the reduced value, resulting in an (n-1)-dimensional array. In your case, since <code>X</code> has shape (5, 10), and you performed a reduction along axis 1, you end up with an array with shape (5,): <pre class="prettyprint"><code>In [8]: m = X.mean(axis=1) In [9]: m.shape Out[9]: (5,) </code></pre> When you try to subtract this result from <code>X</code>, you are trying to subtract an array with shape (5,) from an array with shape (5, 10). These shapes are not compatible for broadcasting. (Take a look at the description of broadcasting in the User Guide.) For broadcasting to work the way you want, the result of the <code>mean</code> operation should be an array with shape (5, 1) (to be compatible with the shape (5, 10)). In recent versions of numpy, the reduction operations, including <code>mean</code>, have an argument called <code>keepdims</code> that tells the function to not collapse the reduced dimension. Instead, a trivial dimension with length 1 is kept: <pre class="prettyprint"><code>In [10]: m = X.mean(axis=1, keepdims=True) In [11]: m.shape Out[11]: (5, 1) </code></pre> With older versions of numpy, you can use <code>reshape</code> to restore the collapsed dimension: <pre class="prettyprint"><code>In [12]: m = X.mean(axis=1).reshape(-1, 1) In [13]: m.shape Out[13]: (5, 1) </code></pre> So, depending on your version of numpy, you can do this: <pre class="prettyprint"><code>Y = X - X.mean(axis=1, keepdims=True) </code></pre> or this: <pre class="prettyprint"><code>Y = X - X.mean(axis=1).reshape(-1, 1) </code></pre>

If you are looking for performance, you can also consider using <code>np.einsum</code> that is supposedly faster than actually using <code>np.sum</code> or <code>np.mean</code>. Thus, the desired output could be obtained like so - <pre class="prettyprint"><code>X - np.einsum('ij->i',X)[:,None]/X.shape[1] </code></pre> Please note that the <code>[:,None]</code> part is similar to <code>keepdims</code> to keep the dimensions of it same as that of the input array. This could also be used in broadcasting. Runtime tests 1) Comparing just the <code>mean</code> calculation - <pre class="prettyprint"><code>In [47]: X = np.random.rand(500, 1000) In [48]: %timeit X.mean(axis=1, keepdims=True) 1000 loops, best of 3: 1.5 ms per loop In [49]: %timeit X.mean(axis=1).reshape(-1, 1) 1000 loops, best of 3: 1.52 ms per loop In [50]: %timeit np.einsum('ij->i',X)[:,None]/X.shape[1] 1000 loops, best of 3: 832 µs per loop </code></pre> 2) Comparing entire calculation - <pre class="prettyprint"><code>In [52]: X = np.random.rand(500, 1000) In [53]: %timeit X - X.mean(axis=1, keepdims=True) 100 loops, best of 3: 6.56 ms per loop In [54]: %timeit X - X.mean(axis=1).reshape(-1, 1) 100 loops, best of 3: 6.54 ms per loop In [55]: %timeit X - np.einsum('ij->i',X)[:,None]/X.shape[1] 100 loops, best of 3: 6.18 ms per loop </code></pre>

subtracting the mean of each row in numpy with broadcasting

I try to subtract the mean of each row of a matrix in numpy using broadcasting but I get an error. Any idea why?

Here is the code:

from numpy import *
X = random.rand(5, 10)
Y = X - X.mean(axis = 1)

Error:

ValueError: operands could not be broadcast together with shapes (5,10) (5,)

Thanks!

How do you subtract using NumPy?

Subtracting two matrices in NumPy is a pretty common task to perform. The most straightforward way to subtract two matrices in NumPy is by using the - operator, which is the simplification of the np. subtract() method - NumPy specific method designed for subtracting arrays and other array-like objects such as matrices.

How do you find the average of a row in NumPy?

mean() in Python. numpy. mean(arr, axis = None) : Compute the arithmetic mean (average) of the given data (array elements) along the specified axis.

What is broadcasting in NumPy?

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

The mean method is a reduction operation, meaning it converts a 1-d collection of numbers to a single number. When you apply a reduction to an n-dimensional array along an axis, numpy collapses that dimension to the reduced value, resulting in an (n-1)-dimensional array. In your case, since X has shape (5, 10), and you performed a reduction along axis 1, you end up with an array with shape (5,):

In [8]: m = X.mean(axis=1)

In [9]: m.shape
Out[9]: (5,)

When you try to subtract this result from X, you are trying to subtract an array with shape (5,) from an array with shape (5, 10). These shapes are not compatible for broadcasting. (Take a look at the description of broadcasting in the User Guide.)

For broadcasting to work the way you want, the result of the mean operation should be an array with shape (5, 1) (to be compatible with the shape (5, 10)). In recent versions of numpy, the reduction operations, including mean, have an argument called keepdims that tells the function to not collapse the reduced dimension. Instead, a trivial dimension with length 1 is kept:

In [10]: m = X.mean(axis=1, keepdims=True)

In [11]: m.shape
Out[11]: (5, 1)

With older versions of numpy, you can use reshape to restore the collapsed dimension:

In [12]: m = X.mean(axis=1).reshape(-1, 1)

In [13]: m.shape
Out[13]: (5, 1)

So, depending on your version of numpy, you can do this:

Y = X - X.mean(axis=1, keepdims=True)

or this:

Y = X - X.mean(axis=1).reshape(-1, 1)

If you are looking for performance, you can also consider using np.einsum that is supposedly faster than actually using np.sum or np.mean. Thus, the desired output could be obtained like so -

X - np.einsum('ij->i',X)[:,None]/X.shape[1]

Please note that the [:,None] part is similar to keepdims to keep the dimensions of it same as that of the input array. This could also be used in broadcasting.

Runtime tests

1) Comparing just the mean calculation -

In [47]: X = np.random.rand(500, 1000)

In [48]: %timeit X.mean(axis=1, keepdims=True)
1000 loops, best of 3: 1.5 ms per loop

In [49]: %timeit X.mean(axis=1).reshape(-1, 1)
1000 loops, best of 3: 1.52 ms per loop

In [50]: %timeit np.einsum('ij->i',X)[:,None]/X.shape[1]
1000 loops, best of 3: 832 µs per loop

2) Comparing entire calculation -

In [52]: X = np.random.rand(500, 1000)

In [53]: %timeit X - X.mean(axis=1, keepdims=True)
100 loops, best of 3: 6.56 ms per loop

In [54]: %timeit X - X.mean(axis=1).reshape(-1, 1)
100 loops, best of 3: 6.54 ms per loop

In [55]: %timeit X - np.einsum('ij->i',X)[:,None]/X.shape[1]
100 loops, best of 3: 6.18 ms per loop

subtracting the mean of each row in numpy with broadcasting

Tags:

python

numpy

Yuval Atzmon

People also ask

2 Answers

Warren Weckesser

Divakar

Recent Activity

Donate For Us

subtracting the mean of each row in numpy with broadcasting

Tags:

python

numpy

Yuval Atzmon

People also ask

2 Answers

Warren Weckesser

Divakar

Related questions

Recent Activity

Donate For Us