Predicting Values with k-Means Clustering Algorithm

Tags:

I'm messing around with machine learning, and I've written a K Means algorithm implementation in Python. It takes a two dimensional data and organises them into clusters. Each data point also has a class value of either a 0 or a 1.

What confuses me about the algorithm is how I can then use it to predict some values for another set of two dimensional data that doesn't have a 0 or a 1, but instead is unknown. For each cluster, should I average the points within it to either a 0 or a 1, and if an unknown point is closest to that cluster, then that unknown point takes on the averaged value? Or is there a smarter method?

Cheers!

549

asked Nov 19 '11 10:11

DizzyDoo

1 Answers

To assign a new data point to one of a set of clusters created by k-means, you just find the centroid nearest to that point.

In other words, the same steps you used for the iterative assignment of each point in your original data set to one of k clusters. The only difference here is that the centroids you are using for this computation is the final set--i.e., the values for the centroids at the last iteration.

Here's one implementation in python (w/ NumPy):

>>> import numpy as NP
>>> # just made up values--based on your spec (2D data + 2 clusters)
>>> centroids
      array([[54, 85],
             [99, 78]])

>>> # randomly generate a new data point within the problem domain:
>>> new_data = NP.array([67, 78])

>>> # to assign a new data point to a cluster ID,
>>> # find its closest centroid:
>>> diff = centroids - new_data[0,:]  # NumPy broadcasting
>>> diff
      array([[-13,   7],
             [ 32,   0]])

>>> dist = NP.sqrt(NP.sum(diff**2, axis=-1))  # Euclidean distance
>>> dist
      array([ 14.76,  32.  ])

>>> closest_centroid = centroids[NP.argmin(dist),]
>>> closest_centroid
       array([54, 85])

answered Sep 20 '22 01:09

doug

Related questions
                            
                                How to recalculate IP checksum with scapy? [duplicate]
                            
                                Compile Syntax Error: non ASCII letters in a string
                            
                                "Best" way to integrate Django with an Ajax library
                            
                                Edit a function in Python IDLE
                            
                                IDLE can't import Tkinter. Your Python may not be configured for Tk
                            
                                Turn the dictionary keys into variable names with same values in Python from .mat Matlab files using scipy.io.loadmat
                            
                                Making Nose fail slow tests
                            
                                Python: Persistent cookie, generate `expires` field
                            
                                what is the neat way to divide huge nested loops to 8(or more) processes using Python?
                            
                                tkinter python: catching exceptions
                            
                                Django design pattern for web analytics screens that take a really long time to calculate
                            
                                basic paramiko exec_command help
                            
                                Best way to deal with default params in Python?
                            
                                Periodically call a function in pygtk's main loop
                            
                                list union with duplicates
                            
                                Is it a convention to prefix private classes with underscores?
                            
                                Proper capitalization for titles in Python [closed]
                            
                                python ttk treeview: how to select and set focus on a row?
                            
                                importing a module in Idle shell
                            
                                Install Python Module in local install of web2py

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Predicting Values with k-Means Clustering Algorithm

Tags:

python

machine-learning

k-means

data-mining

prediction

DizzyDoo

People also ask

1 Answers

doug

Recent Activity

Donate For Us