1D Number Array Clustering

Tags:

So let's say I have an array like this:

[1,1,2,3,10,11,13,67,71]

Is there a convenient way to partition the array into something like this?

[[1,1,2,3],[10,11,13],[67,71]]

I looked through similar questions yet most people suggested using k-means to cluster points, like scipy, which is quite confusing to use for a beginner like me. Also I think that k-means is more suitable for two or more dimensional clustering right? Are there any ways to partition an array of N numbers to many partitions/clustering depending on the numbers?

Some people also suggest rigid range partitioning, but it doesn't always render the results as expected

833

asked Jul 16 '12 22:07

E.H.

2 Answers

Don't use multidimensional clustering algorithms for a one-dimensional problem. A single dimension is much more special than you naively think, because you can actually sort it, which makes things a lot easier.

In fact, it is usually not even called clustering, but e.g. segmentation or natural breaks optimization.

You might want to look at Jenks Natural Breaks Optimization and similar statistical methods. Kernel Density Estimation is also a good method to look at, with a strong statistical background. Local minima in density are be good places to split the data into clusters, with statistical reasons to do so. KDE is maybe the most sound method for clustering 1-dimensional data.

With KDE, it again becomes obvious that 1-dimensional data is much more well behaved. In 1D, you have local minima; but in 2D you may have saddle points and such "maybe" splitting points. See this Wikipedia illustration of a saddle point, as how such a point may or may not be appropriate for splitting clusters.

See this answer for an example how to do this in Python (green markers are the cluster modes; red markers a points where the data is cut; the y axis is a log-likelihood of the density):

KDE with Python

182

answered Oct 02 '22 13:10

Has QUIT--Anony-Mousse

This simple algorithm works:

points = [0.1, 0.31,  0.32, 0.45, 0.35, 0.40, 0.5 ]  clusters = [] eps = 0.2 points_sorted = sorted(points) curr_point = points_sorted[0] curr_cluster = [curr_point] for point in points_sorted[1:]:     if point <= curr_point + eps:         curr_cluster.append(point)     else:         clusters.append(curr_cluster)         curr_cluster = [point]     curr_point = point clusters.append(curr_cluster) print(clusters)

The above example clusters points into a group, such that each element in a group is at most eps away from another element in the group. This is like the clustering algorithm DBSCAN with eps=0.2, min_samples=1. As others noted, 1d data allows you to solve the problem directly, instead of using the bigger guns like DBSCAN.

The above algorithm is 10-100x faster for some small datasets with <1000 elements I tested.

answered Oct 02 '22 12:10

tyrex

Related questions
                            
                                How to iterate object in JavaScript? [duplicate]
                            
                                Iterate over two arrays simultaneously in bash
                            
                                How to turn a json array into rows in postgres
                            
                                How to get a form input array into a PHP array
                            
                                How to parse JSON Array (Not Json Object) in Android
                            
                                How do I declare a two dimensional array?
                            
                                Adding an item to an associative array
                            
                                Does Index of Array Exist
                            
                                NumPy style arrays for C++? [closed]
                            
                                obtain generic enumerator from an array
                            
                                How to store int[] array in application Settings
                            
                                Should arrays be used in C++?
                            
                                MongoDB: How do I update a single subelement in an array, referenced by the index within the array?
                            
                                Combine two arrays
                            
                                Paginate Javascript array
                            
                                Return index of highest value in an array
                            
                                spread operator vs array.concat()
                            
                                Remove empty strings from array while keeping record Without Loop?
                            
                                Django get a QuerySet from array of id's in specific order
                            
                                Convert an array into an ArrayList [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

1D Number Array Clustering

Tags:

arrays

dimension

cluster-analysis

data-mining

partition-problem

E.H.

People also ask

2 Answers

Has QUIT--Anony-Mousse

tyrex

Recent Activity

Donate For Us