Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to choose the number of components PCA scikitliear

I'm trying to decompse my columns using PCA .

I'm finding some difficulties about how to choose my n_components of the function PCA using scikit learn in python. I did this

sc = StandardScaler()
Z = sc.fit_transform(X)
pca = PCA(n_components = 5')

Can you explain me please .

like image 581
Mathilde Avatar asked Sep 01 '25 00:09

Mathilde


1 Answers

There is no answer that will tell you with probability 1 what is correct number of components. It is application specific.

However there is a following heuristic that you can use. You plot explained variance ratio and choose a number of components that "capture" at least 95% of the variance. In following example the number of components that capture around 95% of the variance is around 30.

pca = PCA().fit(digits.data)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')

enter image description here

like image 89
Farseer Avatar answered Sep 03 '25 22:09

Farseer