I have a model (fit
), based on historic information until last month. Now I would like to predict using my model for the current month. When I try to invoke the following code:
predicted <- predict(fit, testData[-$Readmit])
I get the following error:
Error in UseMethod("predict") : no applicable method for 'predict'
applied to an object of class "train"
Notes:
train
function from caret package, using random forest algorithmThe predict
is a generic function that will invoke the specific predict function based on the first input argument. In my case it will be:
>fit$modelInfo$label
[1] "Random Forest"
Therefore the predict method invoked will be: predict.randomForest. See [caret documentation][3] for more info.
Here the summary source code for generating the model and invoking it:
# Script-1: create a model:
fit <- train(testData[-$Readmit], testData$Readmit)
saveRDS(fit, modelFileName) # save the fit object into a file
# Script-2: predict
fit <- readRDS(modelFileName) # Load the model (generated previously)
predicted <- predict(fit, testData[-$Readmit])
Note: The execution time for generating the model is about 3 hours, that is why I save the object for reusing after that.
The data set from the training model as the following structure:
> str(fit$trainingData)
'data.frame': 29955 obs. of 27 variables:
$ Acuity : Factor w/ 3 levels "Elective ","Emergency ",..: 2 2 2 1 1 2 2 2 1 1 ...
$ AgeGroup : Factor w/ 10 levels "100-105","65-70",..: 8 6 9 9 5 4 9 2 3 2 ...
$ IsPriority : int 0 0 0 0 0 0 0 0 0 0 ...
$ QNXTReferToId : int 115 1703712 115 3690 1948 115 109 512 481 1785596 ...
$ QNXTReferFromId : int 1740397 1724801 1711465 1704170 1714272 1731911 1535 1712758 1740614 1760252 ...
$ iscasemanagement : Factor w/ 2 levels "N","Y": 2 1 1 2 2 1 2 1 2 2 ...
$ iseligible : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
$ referralservicecode : Factor w/ 11 levels "12345","278",..: 1 1 1 9 9 1 1 6 9 9 ...
$ IsHighlight : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
$ admittingdiagnosiscode: num 439 786 785 786 428 ...
$ dischargediagnosiscode: num 439 0 296 786 428 ...
$ RealLengthOfStay : int 3 1 6 1 2 3 3 7 3 2 ...
$ QNXTPCPId : int 1740397 1724801 1711465 1704170 1714272 1731911 1535 1712758 1740614 1760252 ...
$ QNXTProgramId : Factor w/ 3 levels "QMXHPQ0839 ",..: 1 1 1 1 1 1 1 1 1 1 ...
$ physicalzipcode : int 33054 33712 33010 33809 33010 33013 33142 33030 33161 33055 ...
$ gender : Factor w/ 2 levels "F","M": 1 1 1 1 2 1 1 2 2 1 ...
$ ethnicitycode : Factor w/ 4 levels "ETHN0001 ",..: 4 4 4 4 4 4 4 4 4 4 ...
$ dx1 : num 439 786 296 786 428 ...
$ dx2 : num 439 292 785 786 428 ...
$ dx3 : num 402 0 250 0 0 ...
$ svc1 : int 0 120 120 762 762 120 120 120 762 762 ...
$ svc2 : int 120 0 0 0 0 0 0 0 0 0 ...
$ svc3 : int 0 0 0 0 0 0 0 0 0 0 ...
$ Disposition : Factor w/ 28 levels "0","APPEAL & GRIEVANCE REVIEW ",..: 11 11 16 11 11 11 11 11 11 11 ...
$ AvgIncome : Factor w/ 10 levels "-1",">100k","0-25k",..: 3 6 3 8 3 4 3 5 4 4 ...
$ CaseManagerNameID : int 124 1 1 19 20 1 16 1 43 20 ...
$ .outcome : Factor w/ 2 levels "NO","YES": 1 2 2 1 1 1 2 2 1 1 ...
now the testData
will have the following structure:
> str(testData[-$Readmit])
'data.frame': 610 obs. of 26 variables:
$ Acuity : Factor w/ 4 levels "0","Elective ",..: 3 2 4 2 2 2 4 3 3 3 ...
$ AgeGroup : Factor w/ 9 levels "100-105","65-70",..: 4 3 5 4 2 9 4 2 4 6 ...
$ IsPriority : int 0 0 0 0 0 0 1 1 1 1 ...
$ QNXTReferToId : int 2140 482 1703785 1941 114 1714905 1703785 98 109 109 ...
$ QNXTReferFromId : int 1791383 1729375 1718532 1746336 1718267 1718267 1718532 98 109 109 ...
$ iscasemanagement : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 1 2 2 1 ...
$ iseligible : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
$ referralservicecode : Factor w/ 7 levels "12345","IPMAT ",..: 5 1 1 1 1 1 1 5 1 5 ...
$ IsHighlight : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
$ admittingdiagnosiscode: num 11440 11317 11420 11317 1361 ...
$ dischargediagnosiscode: num 11440 11317 11420 11317 1361 ...
$ RealLengthOfStay : int 1 2 4 3 1 1 16 1 1 3 ...
$ QNXTPCPId : int 3212 1713678 1738430 1713671 1720569 1791640 1725962 1148 1703290 1705009 ...
$ QNXTProgramId : Factor w/ 2 levels "QMXHPQ0839 ",..: 1 1 1 1 1 1 1 1 1 1 ...
$ physicalzipcode : int 34744 33175 33844 33178 33010 33010 33897 33126 33127 33125 ...
$ gender : Factor w/ 2 levels "F","M": 2 1 2 1 2 2 2 1 1 2 ...
$ ethnicitycode : Factor w/ 1 level "No Ethnicity ": 1 1 1 1 1 1 1 1 1 1 ...
$ dx1 : num 11440 11317 11420 11317 1361 ...
$ dx2 : num 11440 11317 11420 11317 1361 ...
$ dx3 : num 0 1465 0 11326 0 ...
$ svc1 : int 52648 27447 50040 27447 55866 55866 51595 0 99221 300616 ...
$ svc2 : int 76872 120 50391 120 120 38571 120 762 120 0 ...
$ svc3 : int 762 0 120 0 0 51999 0 0 0 762 ...
$ Disposition : Factor w/ 14 levels "0","DENIED- Not Medically Necessary ",..: 3 5 3 4 3 3 5 3 3 5 ...
$ AvgIncome : Factor w/ 10 levels "-1",">100k","0-25k",..: 6 7 5 9 3 3 6 4 3 4 ...
$ CaseManagerNameID : int 1 2 3 4 5 6 7 8 9 7 ...
The variable structure is the same, just that some factor variables has different levels because some variable has new values. For example: Acuity
in the model has 3-levels and in the testing data 4-levels.
I don't have from upfront a way to know all possible level for all variables.
Any advice, please...
Thanks in advance,
David
I think I found why this happened...The predict
is a generic function from: stats
package. I use the namespace ::
-notation for invoking the functions from the caret
package (that is the recommendation for creating a user packages) and the equivalent predict
function from caret
package is: predict.train
, that is an internal function, that cannot be invoked by an external application. The only way to invoke this function, is using the generic predict
function from stats
package, then based on the class of the first input argument: predicted <- predict(fit, testData[-$Readmit])
it identifies the particular predict
function will be invoked.
For this particular case the class of this function is train
, so it would call actually the function: train.predict
from caret
package. This function also handles the particular function requested for prediction based on the algorithm (method) used, for example: predict.gbm
or predict.glm
, etc. It is explained, in detail, in the caret documentation section: "5.7 Extracting Predictions and Class Probabilities".
Therefore the ::
-notation works well for other functions in the package, such as: caret.train
for example, but not for this particular one: predict
. In such cases it is necessary to explicitly load the library, so it internally can invoke predict.train
function.
In short, the solution is just adding the following line before invoking the predict
function:
library(caret)
Then error disappears.
Based on the answer from @David Leal, I tried loading library(caret)
before calling the predict function but it did not help.
After trying a bit, I realized that I had to load the library that contains the model itself. In my case, I had to call library(kenlab)
for Support Vectors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With