Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

knn predictions with Clustering

I have a 60.000 obs/40 Variable dataset on which I used Clara, mainly due to memory constrains.

library(cluster)    
library(dplyr)    

mutate(kddnew, Att=ifelse(Class=="normal","normal", "attack"))
ds <- dat[,c(-20,-21,-40)

clus <- clara(ds, 3, samples=500, sampsize=100, pamLike=TRUE)

This returned a table with medoids.

Now I'm trying to use knn to do a prediction like this:

medoidz <- clus$medoids
r <- knn(medoidz, ds, cl=ds$targetvariable)

And it returns

'train' and 'class' have different lengths

Can someone please shed some light on how to use it?

like image 444
Raw Data Avatar asked Nov 28 '25 06:11

Raw Data


1 Answers

This works:

require(cluster)
require(class)

data(iris)
ds   <- iris
ds$y <- as.numeric(ds$Species)
ds$Species <- NULL

idx      <- rbinom(nrow(ds), 2, .6)
training <- ds[idx,]
testing  <- ds[-idx,]
x        <- training
y        <- training$y
x1       <- testing
y1       <- testing$y

clus <- clara(x, 3, samples = 1, sampsize = nrow(x), pamLike=TRUE)

knn(train = x, test = x1, cl = clus$clustering, k = 10, l = 0, prob = T, use.all = T)

Though 3 is clearly a poor choice for the number of clusters in this dataset, so the prediction isn't good. Hopefully you'll choose the right number of clusters for your data and you can test your prediction strength with prediction.strength from the package fpc or in other ways.

like image 175
Hack-R Avatar answered Nov 29 '25 20:11

Hack-R



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!