I'm trying to decompse my columns using PCA .
I'm finding some difficulties about how to choose my n_components of the function PCA using scikit learn in python. I did this
sc = StandardScaler()
Z = sc.fit_transform(X)
pca = PCA(n_components = 5')
Can you explain me please .
There is no answer that will tell you with probability 1 what is correct number of components. It is application specific.
However there is a following heuristic that you can use. You plot explained variance ratio and choose a number of components that "capture" at least 95% of the variance. In following example the number of components that capture around 95% of the variance is around 30.
pca = PCA().fit(digits.data)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With