Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I resolve the following dimension mismatch with R's K nearest neighbors?

Tags:

r

In the code below, I am trying to use K nearest neighbors with a single predictor. To the best of my understanding, there's no need for the number of examples in train.X to match the number of examples in test.X, but R seems to not be parsing my input correctly.

library(ISLR)
library(class)

train=(Weekly$Year<2009)
train.X = Weekly$Lag2[train]
test.X = Weekly$Lag2[!train]
train.Direction = Weekly$Direction[train]
knn.pred = knn(train.X, test.X, train.Direction, k=1)

When the code above is run, it gets the error

   Error in knn(train.X, test.X, train.Direction, k = 1) :   
      dims of 'test' and 'train' differ                       

How can I fix train.X and test.X so that R parses them correctly?

like image 289
merlin2011 Avatar asked Oct 16 '13 06:10

merlin2011


1 Answers

The knn function takes matrices or data frames as arguments for train and test set. You're passing in a vector, which gets interpreted as a matrix, but not in the way you want. Specifically, the data you pass in is interpreted as a single data point with the different values denoting the features. This means that the number of features for train and test is different, as the error message suggests.

To fix, simply convert explicitly, e.g.

knn.pred = knn(data.frame(train.X), data.frame(test.X), train.Direction, k=1)
like image 191
Lars Kotthoff Avatar answered Nov 01 '22 19:11

Lars Kotthoff