I used caret
to train an rpart
model below.
trainIndex <- createDataPartition(d$Happiness, p=.8, list=FALSE)
dtrain <- d[trainIndex, ]
dtest <- d[-trainIndex, ]
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv", number=10, repeats=10)
fitRpart <- train(Happiness ~ ., data=dtrain, method="rpart",
trControl = fitControl)
testRpart <- predict(fitRpart, newdata=dtest)
dtest
contains 1296 observations, so I expected testRpart
to produce a vector of length 1296. Instead it's 1077 long, i.e. 219 short.
When I ran the prediction on the first 220 rows of dtest
, I got a predicted result of 1, so it's consistently 219 short.
Any explanation on why this is so, and what I can do to get a consistent output to the input?
Edit: d
can be loaded from here to reproduce the above.
I downloaded your data and found what explains the discrepancy.
If you simply remove the missing values from your dataset, the length of the outputs match:
testRpart <- predict(fitRpart, newdata = na.omit(dtest))
Note nrow(na.omit(dtest))
is 1103, and length(testRpart)
is 1103. So you need a strategy to address missing values. See ?predict.rpart
and the options for the na.action parameter to choose what you want.
Similar to what Josh mentioned, if you need to generate predictions using predict.train
from caret, simply pass the na.action
of na.pass
:
testRpart <- predict(fitRpart, newdata = dtest, na.action = na.pass)
Note: moving this to a separate answer based on Ricky's comment on Josh's answer above for visibility.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With