Plot KMeans clusters and classification for 1-dimensional data

Question

I am using KMeans to cluster the three time-series datasets with different characterstics. For reproducibility reasons, I am sharing the data here.

Here is my code

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

protocols = {}

types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}

for protname, fname in types.items():
    col_time,col_window = np.loadtxt(fname,delimiter=',').T
    trailing_window = col_window[:-1] # "past" values at a given index
    leading_window  = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds]/trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]

    protocols[protname] = {
        "col_time": col_time,
        "col_window": col_window,
        "quotient_times": quotient_times,
        "quotient": quotient,
    }



k_means = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=0, tol=0.0001, verbose=0)
k_means.fit(quotient.reshape(-1,1))

This way, given a new data point (with quotient and quotient_times), I want to know which cluster it belongs to by building each dataset stacking these two transformed features quotient and quotient_times with KMeans.

k_means.labels_ gives this output array([1, 1, 0, 1, 2, 1, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int32)

Finally, I want to visualize the clusters using plt.plot(k_means, ".",color="blue") but I am getting this error: TypeError: float() argument must be a string or a number, not 'KMeans'. How do we plot KMeans clusters?

mfitzp · Accepted Answer

What you're effectively looking for is a range of values between which points are considered to be in a given class. It's quite unusual to use KMeans to classify 1d data in this way, although it certainly works. As you've noticed you need to convert your input data to a 2d array in order to use the method.

k_means = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=0, tol=0.0001, verbose=0)

quotient_2d = quotient.reshape(-1,1)
k_means.fit(quotient_2d)

You will need the quotient_2d again for the classification (prediction) step later.

First we can plot the centroids, since the data is 1d the x-axis point is arbitrary.

colors = ['r','g','b']
centroids = k_means.cluster_centers_
for n, y in enumerate(centroids):
    plt.plot(1, y, marker='x', color=colors[n], ms=10)
plt.title('Kmeans cluster centroids')

This produces the following plot.

cluster centroids

To get cluster membership for the points, pass quotient_2d to .predict. This returns an array of numbers for class membership, e.g.

>>> Z = k_means.predict(quotient_2d)
>>> Z
array([1, 1, 0, 1, 2, 1, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int32)

We can use this to filter our original data, plotting each class in a separate color.

# Plot each class as a separate colour
n_clusters = 3 
for n in range(n_clusters):
    # Filter data points to plot each in turn.
    ys = quotient[ Z==n ]
    xs = quotient_times[ Z==n ]

    plt.scatter(xs, ys, color=colors[n])

plt.title("Points by cluster")

This generates the following plot with the original data, each point coloured by the cluster membership.

points coloured by cluster

Plot KMeans clusters and classification for 1-dimensional data

Tags:

python

matplotlib

machine-learning

k-means

scikit-learn

1 Answers

mfitzp

Recent Activity

Donate For Us

Plot KMeans clusters and classification for 1-dimensional data

Tags:

python

matplotlib

machine-learning

k-means

scikit-learn

1 Answers

mfitzp

Related questions

Recent Activity

Donate For Us