Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding clusters of numbers in a list

Tags:

python

list

I'm struggling with that, since I'm sure that a dozen for-loops is not the solution for this problem:

There is a sorted list of numbers like

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]

and I want to create a dict with lists of numbers, wherein the difference of the numbers (following each other) is not more than 15. So the output would be this:

clusters = {
    1 : [123, 124, 128],
    2 : [160, 167],
    3 : [213, 215, 230, 245, 255, 257],
    4 : [400, 401, 402],
    5 : [430]
}

My current solution is a bit ugly (I have to remove duplicates at the end…), I'm sure it can be done in a pythonic way.

This is what I do now:

clusters = {}  
dIndex = 0 
for i in range(len(numbers)-1) :
    if numbers[i+1] - numbers[i] <= 15 :
        if not clusters.has_key(dIndex) : clusters[dIndex] = []
        clusters[dIndex].append(numbers[i])
        clusters[dIndex].append(numbers[i+1])
    else : dIndex += 1
like image 591
tamasgal Avatar asked Apr 04 '13 01:04

tamasgal


People also ask

How do I find clusters in Python?

1) To know the number of clusters you have to define a threshold that will tell the algorithm that how much two tuples must differ before they can be considered as belonging to two different groups. For example, consider these two groups of coins: 5 cents, and 2 cents, such that each of them has a different weight.


2 Answers

Not strictly necessary if your list is small, but I'd probably approach this in a "stream-processing" fashion: define a generator that takes your input iterable, and yields the elements grouped into runs of numbers differing by <= 15. Then you can use that to generate your dictionary easily.

def grouper(iterable):
    prev = None
    group = []
    for item in iterable:
        if prev is None or item - prev <= 15:
            group.append(item)
        else:
            yield group
            group = [item]
        prev = item
    if group:
        yield group

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]
dict(enumerate(grouper(numbers), 1))

prints:

{1: [123, 124, 128],
 2: [160, 167],
 3: [213, 215, 230, 245, 255, 257],
 4: [400, 401, 402],
 5: [430]}

As a bonus, this lets you even group your runs for potentially-infinite lists (as long as they're sorted, of course). You could also stick the index generation part into the generator itself (instead of using enumerate) as a minor enhancement.

like image 196
tzaman Avatar answered Oct 14 '22 04:10

tzaman


import itertools
import numpy as np

numbers = np.array([123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430])
nd = [0] + list(np.where(np.diff(numbers) > 15)[0] + 1) + [len(numbers)]

a, b = itertools.tee(nd)
next(b, None)
res = {}
for j, (f, b) in enumerate(itertools.izip(a, b)):
    res[j] = numbers[f:b]

If you can use itertools and numpy. Adapted pairwise for the iterator tricks. The +1 is needed to shift the index, adding the 0 and len(numbers) onto the list makes sure the first and last entries are included correctly.

You can obviously do this with out itertools, but I like tee.

like image 27
tacaswell Avatar answered Oct 14 '22 03:10

tacaswell