I am trying to run knnreg from the package caret. For some reason, this training set works:
> summary(train1)
V1 V2 V3
13 : 10474 1 : 6435 7 : 8929
10 : 10315 2 : 6435 6 : 8895
4 : 10272 3 : 6435 9 : 8892
1 : 10244 4 : 6435 10 : 8892
2 : 10238 7 : 6435 15 : 8874
24 : 10228 8 : 6435 40 : 8870
(Other):359799 (Other):382960 (Other):368218
While this one won't work:
> summary(train2)
V1 V2 V3 V4
13 : 10474 1 : 6436 7 : 8929 Christmas : 5946
10 : 10315 2 : 6436 6 : 8895 Labor Day : 8861
4 : 10272 3 : 6438 9 : 8892 None :391909
1 : 10244 4 : 6435 10 : 8892 Super Bowl : 8895
2 : 10238 7 : 6435 15 : 8874 Thanksgiving: 5959
24 : 10228 8 : 6435 40 : 8870
(Other):359799 (Other):382960 (Other):368218
Here is the target vector:
> summary(Target)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-499 200 712 1980 20210 693100
The error I get is during the prediction phase:
> fit <- knnreg(train2, Target, k = 2)
> Prediction <- predict(fit, newdata=test)
Error in knnregTrain(train = list(V1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, :
NA/NaN/Inf in foreign function call (arg 5)
In addition: Warning messages:
1: In knnregTrain(train = list(V1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, :
NAs introduced by coercion
2: In knnregTrain(train = list(V1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, :
NAs introduced by coercion
While this is my test set:
> summary(test)
V1 V2 V3 V4
13 : 2836 1 : 1755 51 : 3002 Christmas : 2988
4 : 2803 2 : 1755 49 : 2989 Labor Day : 0
19 : 2799 3 : 1755 52 : 2988 None :106136
2 : 2797 4 : 1755 50 : 2986 Super Bowl : 2964
27 : 2791 7 : 1755 6 : 2984 Thanksgiving: 2976
24 : 2790 8 : 1755 47 : 2976
(Other):98248 (Other):104534 (Other):97139
What am I missing?
EDIT: Switching the V4 set labels to '1', '2', ... actually fixes the problem. Is the algorithm considers my features as numerical even though they're factors?
Limitations of KNN: However, it has the following set of limitations: 1. Doesn't work well with a large dataset: Since KNN is a distance-based algorithm, the cost of calculating distance between a new point and each existing point is very high which in turn degrades the performance of the algorithm.
KNN can be used for solving both classification and regression problems.
As we saw above, KNN algorithm can be used for both classification and regression problems. The KNN algorithm uses 'feature similarity' to predict the values of any new data points.
I realized that knnreg will receive only numerical values and when I tried to train the model with train1, it considered all values to be numerical (when in fact they are categorical). train2 returns an error because V4 is not numerical, and knnreg can't convert it into numerical either.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With