Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a decision boundary graph for kNN models in the Caret package?

I'd like to plot a decision boundary for the model created by the Caret package. Ideally, I'd like a general case method for any classifier model from Caret. However, I'm currently working with the kNN method. I've included code below that uses the wine quality dataset from UCI which is what I'm working with right now.

I found this method that works with the generic kNN method in R, but can't figure out how to map it to Caret -> https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearest-neighbor-classifier-from-elements-o/21602#21602

    library(caret)

    set.seed(300)

    wine.r <- read.csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', sep=';')
    wine.w <- read.csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv', sep=';')

    wine.r$style <- "red"
    wine.w$style <- "white"

    wine <- rbind(wine.r, wine.w)

    wine$style <- as.factor(wine$style)

    formula <- as.formula(quality ~ .)

    dummies <- dummyVars(formula, data = wine)
    dummied <- data.frame(predict(dummies, newdata = wine))
    dummied$quality <- wine$quality

    wine <- dummied

    numCols <- !colnames(wine) %in% c('quality', 'style.red', 'style.white')

    low <- wine$quality <= 6
    high <- wine$quality > 6
    wine$quality[low] = "low"
    wine$quality[high] = "high"
    wine$quality <- as.factor(wine$quality)

    indxTrain <- createDataPartition(y = wine[, names(wine) == "quality"], p = 0.7, list = F)

    train <- wine[indxTrain,]
    test <- wine[-indxTrain,]

    corrMat <- cor(train[, numCols])
    correlated <- findCorrelation(corrMat, cutoff = 0.6)

    ctrl <- trainControl(
                         method="repeatedcv",
                         repeats=5,
                         number=10,
                         classProbs = T
                         )

    t1 <- train[, -correlated]
    grid <- expand.grid(.k = c(1:20))

    knnModel <- train(formula, 
                      data = t1, 
                      method = 'knn', 
                      trControl = ctrl, 
                      tuneGrid = grid, 
                      preProcess = 'range'
                      )

    t2 <- test[, -correlated]
    knnPred <- predict(knnModel, newdata = t2)

    # How do I render the decision boundary?
like image 287
James Kyle Avatar asked Sep 08 '15 04:09

James Kyle


1 Answers

The first step is to actually understand what the code you linked is doing! Indeed you can produce such a graph without anything to do with KNN.

For example, lets just have some sample data, where we just "colour" the lower quadrant of your data.

Step 1

Generate a grid. Basically how the graphing works, is create a point at each coordinate so we know which group it belongs to. in R this is done using expand.grid to go over all possible points.

x1 <- 1:200
x2 <- 50:250

cgrid <- expand.grid(x1=x1, x2=x2)
# our "prediction" colours the bottom left quadrant
cgrid$prob <- 1
cgrid[cgrid$x1 < 100 & cgrid$x2 < 170, c("prob")] <- 0

If this was knn, it would be the prob would be the prediction for that particular point.

Step 2

Now plotting it is relatively straightforward. You need to conform to the contour function, so you firstly create a matrix with the probabilities.

matrix_val <- matrix(cgrid$prob, 
                     length(x1), 
                     length(x2))

Step 3

Then you can proceed as what the link did:

contour(x1, x2, matrix_val, levels=0.5, labels="", xlab="", ylab="", main=
          "Some Picture", lwd=2, axes=FALSE)
gd <- expand.grid(x=x1, y=x2)
points(gd, pch=".", cex=1.2, col=ifelse(prob==1, "coral", "cornflowerblue"))
box()

output:

somepic


So then back to your particular example. I'm going to use iris, because your data wasn't very interesting to look at, but the same principle applies. To create the grid you will need to choose your x-y axis and leave everything else fixed!

knnModel <- train(Species ~., 
                  data = iris, 
                  method = 'knn')

lgrid <- expand.grid(Petal.Length=seq(1, 5, by=0.1), 
                     Petal.Width=seq(0.1, 1.8, by=0.1),
                     Sepal.Length = 5.4,
                     Sepal.Width=3.1)

Next simply use the predict function as you have done above.

knnPredGrid <- predict(knnModel, newdata=lgrid)
knnPredGrid = as.numeric(knnPredGrid) # 1 2 3

And then construct the graph:

pl = seq(1, 5, by=0.1)
pw = seq(0.1, 1.8, by=0.1)

probs <- matrix(knnPredGrid, length(pl), 
                 length(pw))

contour(pl, pw, probs, labels="", xlab="", ylab="", main=
          "X-nearest neighbour", axes=FALSE)

gd <- expand.grid(x=pl, y=pw)

points(gd, pch=".", cex=5, col=probs)
box()   

This should yield an output like this:

iris


To add test/train results from your model, you can follow what I've done. The only difference is you need to add the predicted points (this is not the same as the grid which were used to generate the boundary.

library(caret) 
data(iris)

indxTrain <- createDataPartition(y = iris[, names(iris) == "Species"], p = 0.7, list = F)

train <- iris[indxTrain,]
test <- iris[-indxTrain,]

knnModel <- train(Species ~.,
                  data = train,
                  method = 'knn')

pl = seq(min(test$Petal.Length), max(test$Petal.Length), by=0.1)
pw = seq(min(test$Petal.Width), max(test$Petal.Width), by=0.1)

# generates the boundaries for your graph
lgrid <- expand.grid(Petal.Length=pl, 
                     Petal.Width=pw,
                     Sepal.Length = 5.4,
                     Sepal.Width=3.1)

knnPredGrid <- predict(knnModel, newdata=lgrid)
knnPredGrid = as.numeric(knnPredGrid)

# get the points from the test data...
testPred <- predict(knnModel, newdata=test)
testPred <- as.numeric(testPred)
# this gets the points for the testPred...
test$Pred <- testPred

probs <- matrix(knnPredGrid, length(pl), length(pw))

contour(pl, pw, probs, labels="", xlab="", ylab="", main="X-Nearest Neighbor", axes=F)
gd <- expand.grid(x=pl, y=pw)

points(gd, pch=".", cex=5, col=probs)

# add the test points to the graph
points(test$Petal.Length, test$Petal.Width, col=test$Pred, cex=2)
box()

Output:

enter image description here

Alternatively you can use ggplot to do the graphing which might be easier:

ggplot(data=lgrid) + stat_contour(aes(x=Petal.Length, y=Petal.Width, z=knnPredGrid),
                            bins=2) +
  geom_point(aes(x=Petal.Length, y=Petal.Width, colour=as.factor(knnPredGrid))) +
  geom_point(data=test, aes(x=test$Petal.Length, y=test$Petal.Width, colour=as.factor(test$Pred)),
             size=5, alpha=0.5, shape=1)+
  theme_bw()

Output:

enter image description here

like image 100
chappers Avatar answered Sep 20 '22 00:09

chappers