I want to know whether the k-means clustering algorithm can do classification? If I have done a simple k-means clustering . Assume I have many data , I use k-means clusterings, then get 2 clusters A, B. and the centroid calculating method is Euclidean distance. Cluster A at left side. Cluster B at right side. So, if I have one new data. What should I do? <ol> <li>Run k-means clustering algorithm again, and can get which cluster does the new data belong to?</li> <li>Record the last centroid and use Euclidean distance to calculating to decide the new data belong to?</li> <li>other method?</li> </ol>

The simplest method of course is 2., assign each object to the closest centroid (technically, use sum-of-squares, not Euclidean distance; this is more correct for k-means, and saves you a sqrt computation). Method 1. is fragile, as k-means may give you a completely different solution; in particular if it didn't fit your data well in the first place (e.g. too high dimensional, clusters of too different size, too many clusters, ...) However, the following method may be even more reasonable: 3. Train an actual classifier. Yes, you can use k-means to produce an initial partitioning, then assume that the k-means partitions could be reasonable classes (you really should validate this at some point though), and then continue as you would if the data would have been user-labeled. I.e. run k-means, train a SVM on the resulting clusters. Then use SVM for classification. k-NN classification, or even assigning each object to the nearest cluster center (option 1) can be seen as very simple classifiers. The latter is a 1NN classifier, "trained" on the cluster centroids only.

Yes, we can do classification. I wouldn't say the algorithm itself (like #1) is particularly well-suited to classifying points, as incorporating data to be classified into your training data tends to be frowned upon (unless you have a real-time system, but I think elaborating on this would get a bit far from the point). To classify a new point, simply calculate the Euclidean distance to each cluster centroid to determine the closest one, then classify it under that cluster. There are data structures that allows you to more efficiently determine the closest centroid (like a kd-tree), but the above is the basic idea.

Can k-means clustering do classification?

2 Answers

The simplest method of course is 2., assign each object to the closest centroid (technically, use sum-of-squares, not Euclidean distance; this is more correct for k-means, and saves you a sqrt computation).

Method 1. is fragile, as k-means may give you a completely different solution; in particular if it didn't fit your data well in the first place (e.g. too high dimensional, clusters of too different size, too many clusters, ...)

However, the following method may be even more reasonable:

3. Train an actual classifier.

Yes, you can use k-means to produce an initial partitioning, then assume that the k-means partitions could be reasonable classes (you really should validate this at some point though), and then continue as you would if the data would have been user-labeled.

I.e. run k-means, train a SVM on the resulting clusters. Then use SVM for classification.

k-NN classification, or even assigning each object to the nearest cluster center (option 1) can be seen as very simple classifiers. The latter is a 1NN classifier, "trained" on the cluster centroids only.

123

answered Oct 12 '22 15:10

Has QUIT--Anony-Mousse

Yes, we can do classification.

I wouldn't say the algorithm itself (like #1) is particularly well-suited to classifying points, as incorporating data to be classified into your training data tends to be frowned upon (unless you have a real-time system, but I think elaborating on this would get a bit far from the point).

To classify a new point, simply calculate the Euclidean distance to each cluster centroid to determine the closest one, then classify it under that cluster.

There are data structures that allows you to more efficiently determine the closest centroid (like a kd-tree), but the above is the basic idea.

answered Oct 12 '22 13:10

Bernhard Barker

Related questions
                            
                                How to set multi-dimensional array in Twig?
                            
                                Test package for different flavors in Android Studio
                            
                                Blade template vs plain php in Laravel
                            
                                Calling setCollectionViewLayout:animated does not reload UICollectionView
                            
                                Bootstrap 3: Offset isn't working?
                            
                                How to convert std::wstring to LPCTSTR in C++?
                            
                                Alembic migration stuck with postgresql?
                            
                                Accessing python list in javascript as an array
                            
                                How to set portrait mode with libGDX?
                            
                                Angular-ui bootstrap modal without creating new controller
                            
                                Wrapping long email addresses in small boxes
                            
                                Grails 2.4 and hibernate4 errors with run-app

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can k-means clustering do classification?

Tags:

Sirius Wang

People also ask

2 Answers

Has QUIT--Anony-Mousse

Bernhard Barker

Recent Activity

Donate For Us