Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find most contributing features to PCA?

I am running PCA on my data (~250 features) and see that all points are clustered in 3 blobs.

Is it possible to see which of the 250 features have been most contributing to the outcome? if so how?

(using the Scikit-learn implementation)

like image 989
oshi2016 Avatar asked Oct 27 '16 23:10

oshi2016


People also ask

How do you know what features are important in PCA?

The importance of each feature is reflected by the magnitude of the corresponding values in the eigenvectors (higher magnitude — higher importance). we can conclude that feature 1, 3 and 4 are the most important for PC1. Similarly, we can state that feature 2 and then 1 are the most important for PC2.

What is contribution in PCA?

The contribution is a scaled version of the squared correlation between variables and component axes (or the cosine, from a geometrical point of view) --- this is used to assess the quality of the representation of the variables of the principal component, and it is computed as cos(variable,axis)2×100 / total cos2 of ...

How many principal components are required to explain 90% of the variance?

Here we see that our two-dimensional projection loses a lot of information (as measured by the explained variance) and that we'd need about 20 components to retain 90% of the variance. Looking at this plot for a high-dimensional dataset can help you understand the level of redundancy present in multiple observations.

How do you choose the number of PCA components?

If our sole intention of doing PCA is for data visualization, the best number of components is 2 or 3. If we really want to reduce the size of the dataset, the best number of principal components is much less than the number of variables in the original dataset.


1 Answers

Let's see what wikipedia says:

PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

To get how 'influent' are vectors from original space in the smaller one you have to project them as well. Which is done by:

res = pca.transform(np.eye(D))
  • np.eye(n) creates a n x n diagonal matrix (one on diagonal, 0 otherwise).
  • Thus, np.eye(D) is your features in original feature space
  • res is the projection of your features in lower space.

The interesting thing is that res is a D x d matrix where res[i][j] represent "how much feature i contribute to component j"

Then, you may just sum over columns to get a D x 1 matrix (call it contributiion where each contribution[i] is the total contribution of feature i.

Sort it and you find the most contributing feature :)

Not sure its clear, could add any kind of additional information.

Hope this helps, pltrdy

like image 158
pltrdy Avatar answered Oct 24 '22 00:10

pltrdy