Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the number of components needed in PCA with all extreme variance?

I am trying to get the number of components needed to be used for classification. I have read a similar question Finding the dimension with highest variance using scikit-learn PCA and the scikit documents about this:

http://scikit-learn.org/dev/tutorial/statistical_inference/unsupervised_learning.html#principal-component-analysis-pca

However, this still did not solve my question. All of my PCA components are super big and of cause I could select all of them but if I do so PCA will be useless.

I also read the PCA library in scikit learn http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html It indicates thatL:

if n_components == ‘mle’, Minka’s MLE is used to guess the dimension if 0 < n_components < 1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components

However I cannot find any more information about use this techniques for analysis n_components of PCA

Here is my code of PCA analysis:

from sklearn.decomposition import PCA
    pca = PCA()
    pca.fit(x_array_train)
    print(pca.explained_variance_)

result:

   [  6.58902714e+50   6.23266555e+49   2.93568652e+49   2.25418736e+49
       1.10063872e+49   3.25107359e+40   4.72113817e+39   1.40411862e+39
       4.03270198e+38   1.60662882e+38   3.20028861e+28   2.35570241e+27
       1.54944915e+27   8.05181151e+24   1.42231553e+24   5.05155955e+23
       2.90909468e+23   2.60339206e+23   1.95672973e+23   1.22987336e+23
       9.67133111e+22   7.07208772e+22   4.49067983e+22   3.57882593e+22
       3.03546737e+22   2.38077950e+22   2.18424235e+22   1.79048845e+22
       1.50871735e+22   1.35571453e+22   1.26605081e+22   1.04851395e+22
       8.88191944e+21   6.91581346e+21   5.43786989e+21   5.05544020e+21
       4.33110823e+21   3.18309135e+21   3.06169368e+21   2.66513522e+21
       2.57173046e+21   2.36482212e+21   2.32203521e+21   2.06033130e+21
       1.89039408e+21   1.51882514e+21   1.29284842e+21   1.26103770e+21
       1.22012185e+21   1.07857244e+21   8.55143095e+20   4.82321416e+20
       2.98301261e+20   2.31336276e+20   1.31712446e+20   1.05253795e+20
       9.84992112e+19   8.27574150e+19   4.66007620e+19   4.09687463e+19
       2.89855823e+19   2.79035170e+19   1.57015298e+19   1.39218538e+19
       1.00594159e+19   7.31960049e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.29043685e+18   5.29043685e+18   5.29043685e+18
       5.29043685e+18   5.24952686e+18   2.09685699e+18   4.16588190e+17]

I tried PCA(n_components = 'mle') however I got these errors ..

    Traceback (most recent call last):
  File "xx", line 166, in <module>
    pca.fit(x_array_train)
  File "xx", line 225, in fit
    self._fit(X)
  File "/Users/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 294, in _fit
    n_samples, n_features)
  File "/Users/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 98, in _infer_dimension_
    ll[rank] = _assess_dimension_(spectrum, rank, n_samples, n_features)
  File "/Users/lib/python2.7/site-packages/sklearn/decomposition/pca.py", line 83, in _assess_dimension_
    (1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples)
ValueError: math domain error

Really appreciate for any helps...

like image 966
Yank Avatar asked Jun 12 '15 11:06

Yank


People also ask

How do you determine the number of components of a PCA?

Choosing the number of components A vital part of using PCA in practice is the ability to estimate how many components are needed to describe the data. This can be determined by looking at the cumulative explained variance ratio as a function of the number of components: In [12]: pca = PCA().

How many principal components are required to explain 95% of the variance?

On the plotted chart, we see what number of principal components we need. In this case, to get 95% of variance explained I need 9 principal components.

What is the maximum number of principal components possible?

In a data set, the maximum number of principal component loadings is a minimum of (n-1, p). Let's look at first 4 principal components and first 5 rows. 3. In order to compute the principal component score vector, we don't need to multiply the loading with data.

How do you find the number of principal components in R?

After loading the data, we can use the R built-in function prcomp() to calculate the principal components of the dataset. Be sure to specify scale = TRUE so that each of the variables in the dataset are scaled to have a mean of 0 and a standard deviation of 1 before calculating the principal components.


2 Answers

I am not using Python, but I did something you need in C++ & opencv. Hope you succeed in converting it to whatever language.

// choose how many eigenvectors you want:
int nEigensOfInterest = 0;
float sum = 0.0;
for (int i = 0; i < mEiVal.rows; ++i)
{
    sum += mEiVal.at<float>(i, 0);
    if (((sum * 100) / (sumOfEigens)) > 80)
    {
        nEigensOfInterest = i;
        break;
    }
}
logfile << "No of Eigens of interest: " << nEigensOfInterest << std::endl << std::endl;

The basic idea is to decide "whatever %" components of you need to go ahead with. I chose those to be 80. mEiVal is column matrix of eigen values sorted in descending order. sumOfEigens is sum of all the eigen values.

I have no experience with scikit-learn, please let me know, I'll delete the answer.

like image 59
Adorn Avatar answered Oct 19 '22 07:10

Adorn


I'm just learning this myself, but it seems to me that the reference to using 0 < n_components < 1 suggests that you could set n_components to, say, 0.85, and the exact number of components you need to explain 85% of the variance will be used. You can verify that the right number of components is selected by also printing sum(pca.explained_variance_). You should get the smallest sum of variance percentages over 0.85 (or whatever value you chose) that's possible for your data.

There are more sophisticated ways to choose a number of components, of course, but a rule of thumb of 70% - 90% is a reasonable start.

like image 20
Bob Smith Avatar answered Oct 19 '22 08:10

Bob Smith