I try to use the stats.zscore() in scipy and have the following results which confuse me.
Suppose I have an array and I compute the z-score in 2 different ways:
>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> a
array([[ 1.,  2.],
       [ 3.,  4.]])
First result:
>>> stats.zscore(a)               
array([[-1., -1.],
       [ 1.,  1.]])
Second result:
>>> mean = np.mean(a)
>>> mean
2.5
>>> std = np.std(a)
>>> std
1.1180339887498949
>>> b = (a-mean)/std
>>> b
array([[-1.34164079, -0.4472136 ],
       [ 0.4472136 ,  1.34164079]])
The above results are different, but if I use another array,
>>> c = np.array([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,  0.1954, 0.6307, 0.6599,  0.1065,  0.0508])
>>> c
array([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,  0.1954,  0.6307, 0.6599,  0.1065,  0.0508])
First result:
>>> stats.zscore(c)
array([ 1.12724554, -1.2469956 , -0.05542642,  1.09231569,  1.16645923, -0.8558472 ,  0.57858329,  0.67480514, -1.14879659, -1.33234306])
Second result:
>>> mean = np.mean(c)
>>> mean
0.45511999999999986
>>> std = np.std(c)
>>> std
0.30346538451691657
>>> b = (c-mean)/std
>>> b
array([ 1.12724554, -1.2469956 , -0.05542642,  1.09231569,  1.16645923, -0.8558472 ,  0.57858329,  0.67480514, -1.14879659, -1.33234306])
    
So when I use another array, the results become the same. Can someone help me understand what I did wrong in this? Thanks!
Since nobody has added an answer and it seems to be correct, I will post Alex Riley's answer here.
Try this to get the same result for the 2D array.
from scipy import stats
stats.zscore(a, axis=None)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With