Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you group/cluster these three areas in arrays in python?

So you have an array

1
2
3
60
70
80
100
220
230
250

For a better understanding:

For better understanding

How would you group/cluster the three areas in arrays in python(v2.6), so you get three arrays in this case containing

[1 2 3] [60 70 80 100] [220 230 250]

Background:

y-axis is frequency, x-axis is number. These numbers are the ten highest amplitudes being represented by their frequencies. I want to create three discrete numbers from them for pattern recognition. There could be many more points but all of them are grouped by a relatively big frequency difference as you can see in this example between about 50 and about 0 and between about 100 and about 220. Note that what is big and what is small changes but the difference between clusters remains significant compared to the difference between elements of a group/cluster.

like image 635
Zurechtweiser Avatar asked Jan 20 '12 10:01

Zurechtweiser


People also ask

How do you cluster an array?

Add a cluster to the front panel. Add the Cluster To Array function to the block diagram. Wire the cluster to the Cluster To Array function. Right-click the Cluster To Array function and select Create»Indicator from the shortcut menu to create an array indicator.

How do you cluster data in Python?

Python offers many useful tools for performing cluster analysis. The best tool to use depends on the problem at hand and the type of data available. There are three widely used techniques for how to form clusters in Python: K-means clustering, Gaussian mixture models and spectral clustering.


2 Answers

Observe that your data points are actually one-dimensional if x just represents an index. You can cluster your points using Scipy's cluster.vq module, which implements the k-means algorithm.

>>> import numpy as np
>>> from scipy.cluster.vq import kmeans, vq
>>> y = np.array([1,2,3,60,70,80,100,220,230,250])
>>> codebook, _ = kmeans(y, 3)  # three clusters
>>> cluster_indices, _ = vq(y, codebook)
>>> cluster_indices
array([1, 1, 1, 0, 0, 0, 0, 2, 2, 2])

The result means: the first three points form cluster 1 (an arbitrary label), the next four form cluster 0 and the last three form cluster 2. Grouping the original points according to the indices is left as an exercise for the reader.

For more clustering algorithms in Python, check out scikit-learn.

like image 111
Fred Foo Avatar answered Oct 14 '22 06:10

Fred Foo


This is a simple algorithm implemented in python that check whether or not a value is too far (in terms of standard deviation) from the mean of a cluster:

from math import sqrt

def stat(lst):
    """Calculate mean and std deviation from the input list."""
    n = float(len(lst))
    mean = sum(lst) / n
    stdev = sqrt((sum(x*x for x in lst) / n) - (mean * mean)) 
    return mean, stdev

def parse(lst, n):
    cluster = []
    for i in lst:
        if len(cluster) <= 1:    # the first two values are going directly in
            cluster.append(i)
            continue

        mean,stdev = stat(cluster)
        if abs(mean - i) > n * stdev:    # check the "distance"
            yield cluster
            cluster[:] = []    # reset cluster to the empty list

        cluster.append(i)
    yield cluster           # yield the last cluster

This will return what you expect in your example with 5 < n < 9:

>>> array = [1, 2, 3, 60, 70, 80, 100, 220, 230, 250]
>>> for cluster in parse(array, 7):
...     print(cluster)
[1, 2, 3]
[60, 70, 80, 100]
[220, 230, 250]
like image 30
Rik Poggi Avatar answered Oct 14 '22 06:10

Rik Poggi