How to use scikit-learn PCA for features reduction and know which features are discarded

Tags:

I am trying to run a PCA on a matrix of dimensions m x n where m is the number of features and n the number of samples.

Suppose I want to preserve the nf features with the maximum variance. With scikit-learn I am able to do it in this way:

from sklearn.decomposition import PCA  nf = 100 pca = PCA(n_components=nf) # X is the matrix transposed (n samples on the rows, m features on the columns) pca.fit(X)  X_new = pca.transform(X)

Now, I get a new matrix X_new that has a shape of n x nf. Is it possible to know which features have been discarded or the retained ones?

Thanks

275

asked Apr 25 '14 13:04

gc5

2 Answers

The features that your PCA object has determined during fitting are in pca.components_. The vector space orthogonal to the one spanned by pca.components_ is discarded.

Please note that PCA does not "discard" or "retain" any of your pre-defined features (encoded by the columns you specify). It mixes all of them (by weighted sums) to find orthogonal directions of maximum variance.

If this is not the behaviour you are looking for, then PCA dimensionality reduction is not the way to go. For some simple general feature selection methods, you can take a look at sklearn.feature_selection

162

answered Oct 16 '22 07:10

eickenberg

The projected features onto principal components will retain the important information (axes with maximum variances) and drop axes with small variances. This behavior is like to compression (Not discard).

And X_proj is the better name of X_new, because it is the projection of X onto principal components

You can reconstruct the X_rec as

X_rec = pca.inverse_transform(X_proj) # X_proj is originally X_new

Here, X_rec is close to X, but the less important information was dropped by PCA. So we can say X_rec is denoised.

In my opinion, I can say the noise is discard.

answered Oct 16 '22 07:10

emeth

Related questions
                            
                                append subprocess.Popen output to file?
                            
                                Variable scope and Try Catch in python
                            
                                Cannot install py2exe with Python 2.7
                            
                                How to get SVMs to play nicely with missing data in scikit-learn?
                            
                                How to open ssl socket using certificate stored in string variables in python
                            
                                IncompleteRead using httplib
                            
                                How to save "complete webpage" not just basic html using Python
                            
                                next() doesn't play nice with any/all in python
                            
                                What's the deal with Python 3.4, Unicode, different languages and Windows?
                            
                                How to group a pandas dataframe by a defined time interval?
                            
                                Django search fields in multiple models
                            
                                Fill NaN based on previous value of row
                            
                                additional conditions on join in django [duplicate]
                            
                                Why does argparse give me a list-in-a-list?
                            
                                Change to sudo user within a python script
                            
                                Replace current process with invocation of subprocess?
                            
                                import problems with scipy.io
                            
                                Setting (mocking) request headers for Flask app unit test
                            
                                python 'with' statement, should I use contextlib.closing?
                            
                                No distributions at all found for some package

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use scikit-learn PCA for features reduction and know which features are discarded

Tags:

python

machine-learning

scikit-learn

feature-selection

pca

gc5

People also ask

2 Answers

eickenberg

emeth

Recent Activity

Donate For Us