Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can k-means clustering do classification?

Tags:

I want to know whether the k-means clustering algorithm can do classification?

If I have done a simple k-means clustering .

Assume I have many data , I use k-means clusterings, then get 2 clusters A, B. and the centroid calculating method is Euclidean distance.

Cluster A at left side.

Cluster B at right side.

So, if I have one new data. What should I do?

  1. Run k-means clustering algorithm again, and can get which cluster does the new data belong to?

  2. Record the last centroid and use Euclidean distance to calculating to decide the new data belong to?

  3. other method?

like image 954
Sirius Wang Avatar asked Mar 10 '14 13:03

Sirius Wang


People also ask

Is K-Means classification or regression?

K-NN is a classification or regression machine learning algorithm while K-means is a clustering machine learning algorithm.

Can KMeans be used for image classification?

Yes! K-Means Clustering can be used for Image Classification of MNIST dataset. Here's how. K-means clustering is an unsupervised learning algorithm which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest centroid.

Is KNN clustering or classification?

KNN is classification (supervised task-- outcome = known class), whereas k-mean is clustering (unsupervised task-- outcome = unknown and possible relate group). K-means clustering represents an unsupervised algorithm, mainly used for clustering, while KNN is a supervised learning algorithm used for classification.

What can K-means clustering be used for?

The K-means clustering algorithm is used to find groups which have not been explicitly labeled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets.


2 Answers

The simplest method of course is 2., assign each object to the closest centroid (technically, use sum-of-squares, not Euclidean distance; this is more correct for k-means, and saves you a sqrt computation).

Method 1. is fragile, as k-means may give you a completely different solution; in particular if it didn't fit your data well in the first place (e.g. too high dimensional, clusters of too different size, too many clusters, ...)

However, the following method may be even more reasonable:

3. Train an actual classifier.

Yes, you can use k-means to produce an initial partitioning, then assume that the k-means partitions could be reasonable classes (you really should validate this at some point though), and then continue as you would if the data would have been user-labeled.

I.e. run k-means, train a SVM on the resulting clusters. Then use SVM for classification.

k-NN classification, or even assigning each object to the nearest cluster center (option 1) can be seen as very simple classifiers. The latter is a 1NN classifier, "trained" on the cluster centroids only.

like image 123
Has QUIT--Anony-Mousse Avatar answered Oct 12 '22 15:10

Has QUIT--Anony-Mousse


Yes, we can do classification.

I wouldn't say the algorithm itself (like #1) is particularly well-suited to classifying points, as incorporating data to be classified into your training data tends to be frowned upon (unless you have a real-time system, but I think elaborating on this would get a bit far from the point).

To classify a new point, simply calculate the Euclidean distance to each cluster centroid to determine the closest one, then classify it under that cluster.

There are data structures that allows you to more efficiently determine the closest centroid (like a kd-tree), but the above is the basic idea.

like image 38
Bernhard Barker Avatar answered Oct 12 '22 13:10

Bernhard Barker