I have a numpy matrix A
where the data is organised column-vector-vise i.e A[:,0]
is the first data vector, A[:,1]
is the second and so on. I wanted to know whether there was a more elegant way to zero out the mean from this data. I am currently doing it via a for
loop:
mean=A.mean(axis=1)
for k in range(A.shape[1]):
A[:,k]=A[:,k]-mean
So does numpy provide a function to do this? Or can it be done more efficiently another way?
Compute the arithmetic mean along the specified axis. Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis.
To find the average of a numpy array, you can use numpy. average() function. The numpy library of Python provides a function called np. average(), used for calculating the weight mean along the specified axis.
mean() to calculate mean values across dimensions in an array. Call numpy. ndarray. mean(axis=x) with x as 0 and then 1 to calculate the mean value of each column and then row in numpy.
As is typical, you can do this a number of ways. Each of the approaches below works by adding a dimension to the mean
vector, making it a 4 x 1 array, and then NumPy's broadcasting takes care of the rest. Each approach creates a view of mean
, rather than a deep copy. The first approach (i.e., using newaxis
) is likely preferred by most, but the other methods are included for the record.
In addition to the approaches below, see also ovgolovin's answer, which uses a NumPy matrix to avoid the need to reshape mean
altogether.
For the methods below, we start with the following code and example array A
.
import numpy as np
A = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
mean = A.mean(axis=1)
numpy.newaxis
>>> A - mean[:, np.newaxis]
array([[-1., 0., 1.],
[-1., 0., 1.],
[-1., 0., 1.],
[-1., 0., 1.]])
None
The documentation states that None
can be used instead of newaxis
. This is because
>>> np.newaxis is None
True
Therefore, the following accomplishes the task.
>>> A - mean[:, None]
array([[-1., 0., 1.],
[-1., 0., 1.],
[-1., 0., 1.],
[-1., 0., 1.]])
That said, newaxis
is clearer and should be preferred. Also, a case can be made that newaxis
is more future proof. See also: Numpy: Should I use newaxis or None?
ndarray.reshape
>>> A - mean.reshape((mean.shape[0]), 1)
array([[-1., 0., 1.],
[-1., 0., 1.],
[-1., 0., 1.],
[-1., 0., 1.]])
ndarray.shape
directlyYou can alternatively change the shape of mean
directly.
>>> mean.shape = (mean.shape[0], 1)
>>> A - mean
array([[-1., 0., 1.],
[-1., 0., 1.],
[-1., 0., 1.],
[-1., 0., 1.]])
You can also use matrix
instead of array
. Then you won't need to reshape:
>>> A = np.matrix([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
>>> m = A.mean(axis=1)
>>> A - m
matrix([[-1., 0., 1.],
[-1., 0., 1.],
[-1., 0., 1.],
[-1., 0., 1.]])
Yes. pylab.demean
:
In [1]: X = scipy.rand(2,3)
In [2]: X.mean(axis=1)
Out[2]: array([ 0.42654669, 0.65216704])
In [3]: Y = pylab.demean(X, axis=1)
In [4]: Y.mean(axis=1)
Out[4]: array([ 1.85037171e-17, 0.00000000e+00])
Source:
In [5]: pylab.demean??
Type: function
Base Class: <type 'function'>
String Form: <function demean at 0x38492a8>
Namespace: Interactive
File: /usr/lib/pymodules/python2.7/matplotlib/mlab.py
Definition: pylab.demean(x, axis=0)
Source:
def demean(x, axis=0):
"Return x minus its mean along the specified axis"
x = np.asarray(x)
if axis == 0 or axis is None or x.ndim <= 1:
return x - x.mean(axis)
ind = [slice(None)] * x.ndim
ind[axis] = np.newaxis
return x - x.mean(axis)[ind]
Looks like some of these answers are pretty old, I just tested this on numpy 1.13.3:
>>> import numpy as np
>>> a = np.array([[1,1,3],[1,0,4],[1,2,2]])
>>> a
array([[1, 1, 3],
[1, 0, 4],
[1, 2, 2]])
>>> a = a - a.mean(axis=0)
>>> a
array([[ 0., 0., 0.],
[ 0., -1., 1.],
[ 0., 1., -1.]])
I think this is much cleaner and simpler. Have a try and let me know if this is somehow inferior than the other answers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With