I'm looking for the fastest algorithm for grouping points on a map into equally sized groups, by distance. The k-means clustering algorithm looks straightforward and promising, but does not produce equally sized groups. Is there a variation of this algorithm or a different one that allows for an equal count of members for all clusters? <blockquote> See also: Group n points in k clusters of equal size </blockquote>

This might do the trick: apply Lloyd's algorithm to get k centroids. Sort the centroids by descending size of their associated clusters in an array. For i = 1 through k-1, push the data points in cluster i with minimal distance to any other centroid j (i < j ≤ k) off to j and recompute the centroid i (but don't recompute the cluster) until the cluster size is n / k. The complexity of this postprocessing step is O(k² n lg n).

K-means algorithm variation with equal cluster size

Tags:

algorithm

map

cluster-analysis

k-means

I'm looking for the fastest algorithm for grouping points on a map into equally sized groups, by distance. The k-means clustering algorithm looks straightforward and promising, but does not produce equally sized groups.

Is there a variation of this algorithm or a different one that allows for an equal count of members for all clusters?

See also: Group n points in k clusters of equal size

627

asked Mar 27 '11 21:03

pixelistik

3 Answers

The ELKI data mining framework has a tutorial on equal-size k-means.

This is not a particulary good algorithm, but it's an easy enough k-means variation to write a tutorial for and teach people how to implement their own clustering algorithm variation; and apparently some people really need their clusters to have the same size, although the SSQ quality will be worse than with regular k-means.

In ELKI 0.7.5, you can select this algorithm as tutorial.clustering.SameSizeKMeansAlgorithm.

answered Sep 16 '22 15:09

Erich Schubert

This might do the trick: apply Lloyd's algorithm to get k centroids. Sort the centroids by descending size of their associated clusters in an array. For i = 1 through k-1, push the data points in cluster i with minimal distance to any other centroid j (i < j ≤ k) off to j and recompute the centroid i (but don't recompute the cluster) until the cluster size is n / k.

The complexity of this postprocessing step is O(k² n lg n).

158

answered Sep 20 '22 15:09

Fred Foo

Just in case anyone wants to copy and paste a short function here you go - basically running KMeans then finding the minimal matching of points to clusters under the constraint of maximal points assigned to cluster (cluster size)

from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist
from scipy.optimize import linear_sum_assignment
import numpy as np


def get_even_clusters(X, cluster_size):
    n_clusters = int(np.ceil(len(X)/cluster_size))
    kmeans = KMeans(n_clusters)
    kmeans.fit(X)
    centers = kmeans.cluster_centers_
    centers = centers.reshape(-1, 1, X.shape[-1]).repeat(cluster_size, 1).reshape(-1, X.shape[-1])
    distance_matrix = cdist(X, centers)
    clusters = linear_sum_assignment(distance_matrix)[1]//cluster_size
    return clusters

answered Sep 18 '22 15:09

Eyal Shulman

Related questions
                            
                                C# - Compare String Similarity
                            
                                find the only unpaired element in the array
                            
                                Is there an O(n) integer sorting algorithm?
                            
                                Python: maximum recursion depth exceeded while calling a Python object
                            
                                Most efficient way to find smallest of 3 numbers Java?
                            
                                Finding maximum for every window of size k in an array
                            
                                What would be the fastest method to test for primality in Java?
                            
                                Most efficient/elegant way to clip a number?
                            
                                How do I find the next multiple of 10 of any integer?
                            
                                Issues implementing the "Wave Collapse Function" algorithm in Python
                            
                                Learning Algorithms and Data Structures Fundamentals [closed]
                            
                                What are the main differences between the Knuth-Morris-Pratt and Boyer-Moore search algorithms?
                            
                                Is it possible to guess a user's mood based on the structure of text?
                            
                                Relaxation of an edge in Dijkstra's algorithm
                            
                                How does a sentinel node offer benefits over NULL?
                            
                                How to pick color palette for a pie-chart? [closed]
                            
                                How much do two rectangles overlap?
                            
                                Java, Shifting Elements in an Array
                            
                                Splitting a string by a character
                            
                                Could a truly random number be generated using pings to pseudo-randomly selected IP addresses?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With