How to set k-Means clustering labels from highest to lowest with Python?

Tags:

I have a dataset of 38 apartments and their electricity consumption in the morning, afternoon and evening. I am trying to clusterize this dataset using the k-Means implementation from scikit-learn, and am getting some interesting results.

First clustering results:

This is all very well, and with 4 clusters I obviously get 4 labels associated to each apartment - 0, 1, 2 and 3. Using the random_state parameter of KMeans method, I can fix the seed in which the centroids are randomly initialized, so consistently I get the same labels attributed to the same apartments.

However, as this specific case is in regards of energy consumption, a measurable classification between the highest and the lowest consumers can be performed. I would like, thus, to assign the label 0 to the apartments with lowest consumption level, label 1 to apartments that consume a bit more and so on.

As of now, my labels are [2 1 3 0], or ["black", "green", "blue", "red"]; I would like them to be [0 1 2 3] or ["red", "green", "black", "blue"]. How should I proceed to do so, while still keeping the centroid initialization random (with fixed seed)?

Thank you very much for the help!

783

asked Jul 03 '17 14:07

Sergio

1 Answers

Transforming the labels through a lookup table is a straightforward way to achieve what you want.

To begin with I generate some mock data:

import numpy as np

np.random.seed(1000)

n = 38
X_morning = np.random.uniform(low=.02, high=.18, size=38)
X_afternoon = np.random.uniform(low=.05, high=.20, size=38)
X_night = np.random.uniform(low=.025, high=.175, size=38)
X = np.vstack([X_morning, X_afternoon, X_night]).T

Then I perform clustering on data:

from sklearn.cluster import KMeans
k = 4
kmeans = KMeans(n_clusters=k, random_state=0).fit(X)

And finally I use NumPy's argsort to create a lookup table like this:

idx = np.argsort(kmeans.cluster_centers_.sum(axis=1))
lut = np.zeros_like(idx)
lut[idx] = np.arange(k)

Sample run:

In [70]: kmeans.cluster_centers_.sum(axis=1)
Out[70]: array([ 0.3214523 ,  0.40877735,  0.26911353,  0.25234873])

In [71]: idx
Out[71]: array([3, 2, 0, 1], dtype=int64)

In [72]: lut
Out[72]: array([2, 3, 1, 0], dtype=int64)

In [73]: kmeans.labels_
Out[73]: array([1, 3, 1, ..., 0, 1, 0])

In [74]: lut[kmeans.labels_]
Out[74]: array([3, 0, 3, ..., 2, 3, 2], dtype=int64)

idx shows the cluster center labels ordered from lowest to highest consumption level. The appartments for which lut[kmeans.labels_] is 0 / 3 belong to the cluster with the lowest / highest consumption levels.

answered Oct 21 '22 06:10

Tonechas

Related questions
                            
                                mocking session in requests library
                            
                                Drawing directions fields
                            
                                Getting header row from numpy.genfromtxt
                            
                                Understanding matplotlib xticks syntax
                            
                                Simple explanation of Google App Engine NDB Datastore
                            
                                Assert that two dictionaries are almost equal
                            
                                Python setup.py include .json files in the egg
                            
                                Moving back and forth between an on-disk database and a fast in-memory database?
                            
                                Why shouldn't Flask be deployed with the built in server?
                            
                                Open Source based Rules Engines in Java or Python [closed]
                            
                                Acessing POST field data without a form (REST api) using Django
                            
                                Use anaconda environment without activate? (e.g. in Crontab)
                            
                                If we want use S3 to host Python packages, how can we tell pip where to find the newest version?
                            
                                Run function exactly once for each row in a Pandas dataframe
                            
                                How to make two django projects share the same database
                            
                                How can I update pip in PyCharm when I have two versions of python?
                            
                                TCP client/server with sockets, server sending files to clients, client hangs, Python
                            
                                How to complete/close a contour in python opencv?
                            
                                Tensorflow model for OCR
                            
                                Django Rest Framework: How to enable swagger docs for function based views

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to set k-Means clustering labels from highest to lowest with Python?

Tags:

python

sorting

numpy

k-means

scikit-learn

Sergio

People also ask

1 Answers

Sample run:

Tonechas

Recent Activity

Donate For Us