I have the following code
import pandas as pd
from sklearn.preprocessing import StandardScaler
import numpy as np
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
df.dropna(how="all", inplace=True) # drops the empty line at file-end
X = df.ix[:,0:4].values
y = df.ix[:,4].values
Next I am scaling the data and get the mean values:
X_std = StandardScaler().fit_transform(X)
mean_vec = np.mean(X_std, axis=0)
What I do not get is that my output is this:
[ -4.73695157e-16 -6.63173220e-16 3.31586610e-16 -2.84217094e-16]
I do understand how these values can be anything other than 0. If I scale it, it should be 0 zero right?
Could anyone explain to me what happens here?
In practice those values are so close to 0 that you can consider them to be 0.
The scaler tries to set the mean to be zero, but due to limitations with numerical representation it can only get the mean really close to 0.
Check this question on the precision of floating point arithmetics.
Also interesting is the concept of Machine Epsilon and that for a float 64 is something like 2.22e-16
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With