Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Principal Component Analysis with Caret

Tags:

r

pca

r-caret

I'm using Caret's PCI preprocessing.

multinomFit <- train(LoanStatus~., 
                     train, 
                     method = "multinom", 
                     std=TRUE, 
                     family=binomial, 
                     metric = "ROC",
                     thresh = 0.85, 
                     verbose = TRUE, 
                     pcaComp=7, 
                     preProcess=c("center", "scale", "pca"), 
                     trControl = ctrl)

I specified, the number of PCA Components to be 7. Why does the summary show the fit using 68 components?

summary(multinomFit)

Call:
multinom(formula = .outcome ~ ., data = dat, decay = param$decay, 
    std = TRUE, family = ..2, thresh = 0.85, verbose = TRUE, 
    pcaComp = 7)

Coefficients:
                   Values  Std. Err.
(Intercept)  1.6650694329 0.03760419
PC1         -0.1023790683 0.01474812
PC2          0.0375344688 0.01554707
PC3         -0.1012080589 0.01870754
PC4         -0.1004020357 0.02418817
PC5          0.0707421015 0.02403815
PC6          0.0034671796 0.02535015
PC7          0.1218028495 0.02852909
PC8          0.2191031963 0.03291266
PC9          0.1534144811 0.02986523
PC10        -0.0665337138 0.02999863
PC11        -0.1313662645 0.03032963
PC12         0.0668422208 0.03397493
PC13         0.0002770594 0.03282500
PC14        -0.0883400819 0.03337427
PC15         0.0221726084 0.03323058
PC16        -0.0222984250 0.03210718
PC17        -0.0394014147 0.03282160
PC18         0.0280583827 0.03459664
PC19        -0.0295243295 0.03430506
PC20        -0.0149573710 0.03358775
PC21         0.0653722886 0.03388418
PC22        -0.0114810174 0.03583050
PC23        -0.0594912738 0.03376091
PC24         0.0117123190 0.03476835
PC25        -0.0406770388 0.03507369
PC26         0.0373200991 0.03440807
PC27         0.0050323427 0.03366658
PC28         0.0678087286 0.03516197
PC29         0.0234294196 0.03459586
PC30         0.0540846491 0.03464610
PC31         0.1054946257 0.03459315
PC32         0.0216292907 0.03485001
PC33         0.0247627243 0.03488016
PC34         0.0033126360 0.03402770
PC35        -0.0434168834 0.03468038
PC36        -0.0098687981 0.03497515
PC37        -0.0193788562 0.03268054
PC38         0.0572276670 0.03837009
PC39         0.0535213906 0.03737078
PC40         0.0007157334 0.03321343
PC41        -0.0286461676 0.03546742
PC42         0.0640903943 0.03378855
PC43        -0.0111873647 0.03626063
PC44        -0.0304589978 0.03448459
PC45         0.0191817954 0.03690284
PC46        -0.0330040383 0.03277895
PC47         0.0328641857 0.03460263
PC48         0.0204941541 0.03460759
PC49         0.0345105736 0.04002168
PC50         0.0076131373 0.03621336
PC51         0.0082765068 0.03299395
PC52        -0.0594596197 0.03633509
PC53        -0.0276656822 0.03596515
PC54         0.0411414647 0.03529887
PC55        -0.0644394706 0.03490393
PC56        -0.0266971243 0.03403656
PC57        -0.1415322396 0.03681683
PC58        -0.0332329932 0.03469459
PC59        -0.0273683007 0.03524604
PC60         0.0450430472 0.03586438
PC61        -0.0708218651 0.03807458
PC62         0.1523605734 0.03851722
PC63        -0.0385759566 0.03920662
PC64        -0.0602633030 0.03902837
PC65         0.0547553856 0.03970764
PC66         0.0727331180 0.04273518
PC67         0.1142574406 0.04522347
PC68        -0.1059928013 0.04077592

Residual Deviance: 5273.035 
AIC: 5411.035 

Finally, is there a way to map the 7 PCA factors which describe 85% of the variation in the data back to 7 input attributes in the original observations?

Thanks in advance.

like image 502
dbl001 Avatar asked Jan 07 '23 13:01

dbl001


1 Answers

You can pass pre-processing options via preProcOptions in trainControl(), have a look at ?trainControl. here is an example,

ctrl <- trainControl(method = "repeatedcv", 
                     repeats = 3, 
                     classProbs = TRUE,
                     preProcOptions = list(thresh = 0.85), #or list(pcaComp = 7)
                     summaryFunction = twoClassSummary)

multinomFit <- train(LoanStatus~., train, 
                     method = "multinom", 
                     family=binomial, 
                     metric = "ROC",  
                     verbose = TRUE, 
                     preProcess=c("center", "scale", "pca"), 
                     trControl = ctrl)

Notice, if you specify the number of PCA components pcaComp = 7, that will over-ride thresh (have a look at ?preProcess). So use one of them.

You can view the contribution of variables to each PCA component by:

multinomFit$preProcess$rotation 
like image 137
howaj Avatar answered Jan 20 '23 18:01

howaj