Finding the dimension with highest variance using scikit-learn PCA

Tags:

I need to use pca to identify the dimensions with the highest variance of a certain set of data. I'm using scikit-learn's pca to do it, but I can't identify from the output of the pca method what are the components of my data with the highest variance. Keep in mind that I don't want to eliminate those dimensions, only identify them.

My data is organized as a matrix with 150 rows of data, each one with 4 dimensions. I'm doing as follow:

pca = sklearn.decomposition.PCA() pca.fit(data_matrix)

When I print pca.explained_variance_ratio_, it outputs an array of variance ratios ordered from highest to lowest, but it doesn't tell me which dimension from the data they correspond to (I've tried changing the order of columns on my matrix, and the resulting variance ratio array was the same).

Printing pca.components_ gives me a 4x4 matrix (I left the original number of components as argument to pca) with some values I can't understand the meaning of...according to scikit's documentation, they should be the components with the maximum variance (the eigenvectors perhaps?), but no sign of which dimension those values refer to.

Transforming the data doesn't help either, because the dimensions are changed in a way I can't really know which one they were originally.

Is there any way I can get this information with scikit's pca? Thanks

743

asked Mar 12 '13 18:03

Alberto A

1 Answers

The pca.explained_variance_ratio_ returned are the variances from principal components. You can use them to find how many dimensions (components) your data could be better transformed by pca. You can use a threshold for that (e.g, you count how many variances are greater than 0.5, among others). After that, you can transform the data by PCA using the number of dimensions (components) that are equal to principal components higher than the threshold used. The data reduced to these dimensions are different from the data on dimensions in original data.

you can check the code from this link:

http://scikit-learn.org/dev/tutorial/statistical_inference/unsupervised_learning.html#principal-component-analysis-pca

192

answered Oct 22 '22 04:10

mad

Related questions
                            
                                What does yellow background mean on object's title in Chrome's heap profiler?
                            
                                How can you programmatically (or with a tool) convert .MHT mhtml files to regular HTML and CSS files?
                            
                                How to get multiply blend mode on a plain UIView (not UIImage)
                            
                                Why does display:block not stretch buttons or input elements
                            
                                Can the "s{1} annoyance" when iterating over a cell array be avoided?
                            
                                How do I get a SignalR hub connection to work cross-domain?
                            
                                Laravel Artisan Queues - high cpu usage
                            
                                django bulk create ignore duplicates [duplicate]
                            
                                How print current time in C++11?
                            
                                Is AngularJS Functional Reactive Programming?
                            
                                VIM: how to append yanked text to unnamed register
                            
                                Code organisation in R package development

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Finding the dimension with highest variance using scikit-learn PCA

Tags:

Alberto A

People also ask

1 Answers

mad

Recent Activity

Donate For Us