I have come across many similar questions on the web but could not find one that solves my problem that I can understand. I would appreciate some explanation here to aid in my understanding. Thanks in advance!
So,
[COEFF,SCORE,latent,tsquare] = princomp(X)
I understand that for coeff
, the columns are in order of decreasing component variance. But do I know the importance of my variables (original datatset), not the importance of the principle component (PC), as what the answer of coeff
might present. Is there any way to rank the importance of the variables I have?
I saw that many statistic software are able to do this, showing which original variables contribute most to the plot, and which are the ones that can be removed to prevent over-fitting issue. Is there a way to do this with MatLab?
My objective is to plot the data in a 2D plot, meaning I will be using PC1 and PC2, which hold the most significant component variance. So again, how do I know which variables should be retain and which should be discarded?
Can anyone explain this to me? Thanks!
If you only care about a projection of your data into 2D plane for visualization, then by all means take the first two coordinates of each point from SCORE
- these are the coordinates you referred to as PC1
and PC2
in your question.
However, if you wish to know which are the two components in X
who contributed most to PC1
and PC2
you'll have to find the entries in the first two columns of COEFF
with maximal absolute value. Since the the first two columns of COEFF
represents the linear combination of elements in X
that produces PC1
and PC2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With