How to choose the number of components PCA scikitliear

Question

I'm trying to decompse my columns using PCA .

I'm finding some difficulties about how to choose my n_components of the function PCA using scikit learn in python. I did this

sc = StandardScaler()
Z = sc.fit_transform(X)
pca = PCA(n_components = 5')

Can you explain me please .

Farseer · Accepted Answer

There is no answer that will tell you with probability 1 what is correct number of components. It is application specific.

However there is a following heuristic that you can use. You plot explained variance ratio and choose a number of components that "capture" at least 95% of the variance. In following example the number of components that capture around 95% of the variance is around 30.

pca = PCA().fit(digits.data)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')

enter image description here

How to choose the number of components PCA scikitliear

Tags:

python

scikit-learn

pca

decomposition

Mathilde

1 Answers

Farseer

Recent Activity

Donate For Us

How to choose the number of components PCA scikitliear

Tags:

python

scikit-learn

pca

decomposition

Mathilde

1 Answers

Farseer

Related questions

Recent Activity

Donate For Us