Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine the importance of variables in PCA using Matlab?

Tags:

matlab

pca

I have come across many similar questions on the web but could not find one that solves my problem that I can understand. I would appreciate some explanation here to aid in my understanding. Thanks in advance!

So,

[COEFF,SCORE,latent,tsquare] = princomp(X)

I understand that for coeff, the columns are in order of decreasing component variance. But do I know the importance of my variables (original datatset), not the importance of the principle component (PC), as what the answer of coeff might present. Is there any way to rank the importance of the variables I have?

I saw that many statistic software are able to do this, showing which original variables contribute most to the plot, and which are the ones that can be removed to prevent over-fitting issue. Is there a way to do this with MatLab?

My objective is to plot the data in a 2D plot, meaning I will be using PC1 and PC2, which hold the most significant component variance. So again, how do I know which variables should be retain and which should be discarded?

Can anyone explain this to me? Thanks!

like image 245
maureen Avatar asked Nov 04 '22 06:11

maureen


1 Answers

If you only care about a projection of your data into 2D plane for visualization, then by all means take the first two coordinates of each point from SCORE - these are the coordinates you referred to as PC1 and PC2 in your question.

However, if you wish to know which are the two components in X who contributed most to PC1 and PC2 you'll have to find the entries in the first two columns of COEFF with maximal absolute value. Since the the first two columns of COEFF represents the linear combination of elements in X that produces PC1 and PC2.

like image 116
Shai Avatar answered Nov 15 '22 09:11

Shai