I am using KMeans to cluster the three time-series datasets with different characterstics. For reproducibility reasons, I am sharing the data here.
Here is my code
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
protocols = {}
types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}
for protname, fname in types.items():
col_time,col_window = np.loadtxt(fname,delimiter=',').T
trailing_window = col_window[:-1] # "past" values at a given index
leading_window = col_window[1:] # "current values at a given index
decreasing_inds = np.where(leading_window < trailing_window)[0]
quotient = leading_window[decreasing_inds]/trailing_window[decreasing_inds]
quotient_times = col_time[decreasing_inds]
protocols[protname] = {
"col_time": col_time,
"col_window": col_window,
"quotient_times": quotient_times,
"quotient": quotient,
}
k_means = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
random_state=0, tol=0.0001, verbose=0)
k_means.fit(quotient.reshape(-1,1))
This way, given a new data point (with quotient and quotient_times), I want to know which cluster it belongs to by building each dataset stacking these two transformed features quotient and quotient_times with KMeans.
k_means.labels_ gives this output array([1, 1, 0, 1, 2, 1, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int32)
Finally, I want to visualize the clusters using plt.plot(k_means, ".",color="blue") but I am getting this error: TypeError: float() argument must be a string or a number, not 'KMeans'. How do we plot KMeans clusters?
What you're effectively looking for is a range of values between which points are considered to be in a given class. It's quite unusual to use KMeans to classify 1d data in this way, although it certainly works. As you've noticed you need to convert your input data to a 2d array in order to use the method.
k_means = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
random_state=0, tol=0.0001, verbose=0)
quotient_2d = quotient.reshape(-1,1)
k_means.fit(quotient_2d)
You will need the quotient_2d again for the classification (prediction) step later.
First we can plot the centroids, since the data is 1d the x-axis point is arbitrary.
colors = ['r','g','b']
centroids = k_means.cluster_centers_
for n, y in enumerate(centroids):
plt.plot(1, y, marker='x', color=colors[n], ms=10)
plt.title('Kmeans cluster centroids')
This produces the following plot.

To get cluster membership for the points, pass quotient_2d to .predict. This returns an array of numbers for class membership, e.g.
>>> Z = k_means.predict(quotient_2d)
>>> Z
array([1, 1, 0, 1, 2, 1, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int32)
We can use this to filter our original data, plotting each class in a separate color.
# Plot each class as a separate colour
n_clusters = 3
for n in range(n_clusters):
# Filter data points to plot each in turn.
ys = quotient[ Z==n ]
xs = quotient_times[ Z==n ]
plt.scatter(xs, ys, color=colors[n])
plt.title("Points by cluster")
This generates the following plot with the original data, each point coloured by the cluster membership.

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With