I am trying to fit a KNN model and obtain a decision boundary using Auto data set in ISLR package in R.
Here I am having a difficulty to identify the decision boundary for a 3 class problem. This is my code so far.I am not getting the decision boundary.
I saw somewhere else in this website, the answer for this type of question using ggplot. But i want to get the answer in the classical way using the plot function.
library("ISLR")
trainxx=Auto[,c(1,3)]
trainyy=(Auto[,8])
n.grid1 <- 50
x1.grid1 <- seq(f = min(trainxx[, 1]), t = max(trainxx[, 1]), l = n.grid1)
x2.grid1 <- seq(f = min(trainxx[, 2]), t = max(trainxx[, 2]), l = n.grid1)
grid <- expand.grid(x1.grid1, x2.grid1)
library("class")
mod.opt <- knn(trainxx, grid, trainyy, k = 10, prob = T)
prob_knn <- attr(mod.opt, "prob")
My problem is mainly after this code segment. I am pretty much sure i have to modify the following segment . But i dont know how . Do i need to use a "nested if" here ?
prob_knn <- ifelse(mod.opt == "3", prob_knn, 1 - prob_knn)
prob_knn <- matrix(prob_knn, n.grid1, n.grid1)
plot(trainxx, col = ifelse(trainyy == "3", "green",ifelse(trainyy=="2", "red","blue")))
title(main = "plot of training data with Desicion boundary K=80")
contour(x1.grid1, x2.grid1, prob_knn, levels = 0.5, labels = "", xlab = "", ylab = "",
main = "", add = T , pch=20)
It wil be a great help if anyone can give a suggestion to solve this issue.
Basically i need something like this for a 3 class problem https://stats.stackexchange.com/questions/21572/how-to-plot-decision-boundary-of-a-k-nearest-neighbor-classifier-from-elements-o
The decision boundaries of kNN (the double lines in Figure 14.6 ) are locally linear segments, but in general have a complex shape that is not equivalent to a line in 2D or a hyperplane in higher dimensions.
A decision boundary is the region of a problem space in which the output label of a classifier is ambiguous. If the decision surface is a hyperplane, then the classification problem is linear, and the classes are linearly separable. Decision boundaries are not always clear cut.
Here is a tweaked approach that draws the decision boundaries as lines. I thought this would require the predicted probability for each class but after reading this answer it turns out you can just mark the predicted probability for each class as 1 wherever that class is predicted, and zero otherwise.
# Create matrices for each class where p = 1 for any point
# where that class was predicted, 0 otherwise
n_classes = 3
class_regions = lapply(1:n_classes, function(class_num) {
indicator = ifelse(mod.opt == class_num, 1, 0)
mat = matrix(indicator, n.grid1, n.grid1)
})
# Set up colours
class_colors = c("#4E79A7", "#F28E2B", "#E15759")
# Add some transparency to make the fill colours less bright
fill_colors = paste0(class_colors, "60")
# Use image to plot the predicted class at each point
classes = matrix(as.numeric(mod.opt), n.grid1, n.grid1)
image(x1.grid1, x2.grid1, classes, col = fill_colors,
main = "plot of training data with decision boundary",
xlab = colnames(trainxx)[1], ylab = colnames(trainxx)[2])
# Draw contours separately for each class
lapply(1:n_classes, function(class_num) {
contour(x1.grid1, x2.grid1, class_regions[[class_num]],
col = class_colors[class_num],
nlevels = TRUE, add = TRUE, lwd = 2, drawlabels = FALSE)
})
# Using pch = 21 for bordered points that stand out a bit better
points(trainxx, bg = class_colors[trainyy],
col = "black",
pch = 21)
The resulting plot:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With