I am trying to standardize a numpy array of shape(M, N) so that its column mean is 0. I think I have used the formula of standardization correctly where x is the random variable and z is the standardized version of x.
z = (x - mean(x)) / std(x)
But the column mean of the resulted array is not 0. They are very small number but not zero. Any insight regarding my misunderstanding or mistake is welcome. Here is my code:
import numpy as np
X = np.load('data/filename.npy').astype('float')
XNormed = (X - np.mean(X, axis=0))/np.std(X, axis=0)
column_mean = np.mean(XNormed, axis=0)
print(column_mean)
Your code is correct but as you mentioned in the formula of your own question you need to divide by the standard deviation and not by the range of the data (as in your code). The line below , which uses numpy's std() should correct it:
XNormed = (X - X.mean())/(X.std())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With