I'm struggling with that, since I'm sure that a dozen for-loops is not the solution for this problem: There is a sorted list of numbers like <pre class="prettyprint"><code>numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430] </code></pre> and I want to create a dict with lists of numbers, wherein the difference of the numbers (following each other) is not more than 15. So the output would be this: <pre class="prettyprint"><code>clusters = { 1 : [123, 124, 128], 2 : [160, 167], 3 : [213, 215, 230, 245, 255, 257], 4 : [400, 401, 402], 5 : [430] } </code></pre> My current solution is a bit ugly (I have to remove duplicates at the end…), I'm sure it can be done in a pythonic way. This is what I do now: <pre class="prettyprint"><code>clusters = {} dIndex = 0 for i in range(len(numbers)-1) : if numbers[i+1] - numbers[i] <= 15 : if not clusters.has_key(dIndex) : clusters[dIndex] = [] clusters[dIndex].append(numbers[i]) clusters[dIndex].append(numbers[i+1]) else : dIndex += 1 </code></pre>

<pre class="prettyprint"><code>import itertools import numpy as np numbers = np.array([123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]) nd = [0] + list(np.where(np.diff(numbers) > 15)[0] + 1) + [len(numbers)] a, b = itertools.tee(nd) next(b, None) res = {} for j, (f, b) in enumerate(itertools.izip(a, b)): res[j] = numbers[f:b] </code></pre> If you can use itertools and numpy. Adapted <code>pairwise</code> for the iterator tricks. The <code>+1</code> is needed to shift the index, adding the <code>0</code> and <code>len(numbers)</code> onto the list makes sure the first and last entries are included correctly. You can obviously do this with out <code>itertools</code>, but I like <code>tee</code>.

Finding clusters of numbers in a list

Tags:

python

list

I'm struggling with that, since I'm sure that a dozen for-loops is not the solution for this problem:

There is a sorted list of numbers like

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]

and I want to create a dict with lists of numbers, wherein the difference of the numbers (following each other) is not more than 15. So the output would be this:

clusters = {
    1 : [123, 124, 128],
    2 : [160, 167],
    3 : [213, 215, 230, 245, 255, 257],
    4 : [400, 401, 402],
    5 : [430]
}

My current solution is a bit ugly (I have to remove duplicates at the end…), I'm sure it can be done in a pythonic way.

This is what I do now:

clusters = {}  
dIndex = 0 
for i in range(len(numbers)-1) :
    if numbers[i+1] - numbers[i] <= 15 :
        if not clusters.has_key(dIndex) : clusters[dIndex] = []
        clusters[dIndex].append(numbers[i])
        clusters[dIndex].append(numbers[i+1])
    else : dIndex += 1

591

asked Apr 04 '13 01:04

tamasgal

2 Answers

Not strictly necessary if your list is small, but I'd probably approach this in a "stream-processing" fashion: define a generator that takes your input iterable, and yields the elements grouped into runs of numbers differing by <= 15. Then you can use that to generate your dictionary easily.

def grouper(iterable):
    prev = None
    group = []
    for item in iterable:
        if prev is None or item - prev <= 15:
            group.append(item)
        else:
            yield group
            group = [item]
        prev = item
    if group:
        yield group

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]
dict(enumerate(grouper(numbers), 1))

prints:

{1: [123, 124, 128],
 2: [160, 167],
 3: [213, 215, 230, 245, 255, 257],
 4: [400, 401, 402],
 5: [430]}

As a bonus, this lets you even group your runs for potentially-infinite lists (as long as they're sorted, of course). You could also stick the index generation part into the generator itself (instead of using enumerate) as a minor enhancement.

196

answered Oct 14 '22 04:10

tzaman

import itertools
import numpy as np

numbers = np.array([123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430])
nd = [0] + list(np.where(np.diff(numbers) > 15)[0] + 1) + [len(numbers)]

a, b = itertools.tee(nd)
next(b, None)
res = {}
for j, (f, b) in enumerate(itertools.izip(a, b)):
    res[j] = numbers[f:b]

If you can use itertools and numpy. Adapted pairwise for the iterator tricks. The +1 is needed to shift the index, adding the 0 and len(numbers) onto the list makes sure the first and last entries are included correctly.

You can obviously do this with out itertools, but I like tee.

answered Oct 14 '22 03:10

tacaswell

Related questions
                            
                                Trouble installing opencv in docker container using pip
                            
                                Is there any list of blog engines, written in Django?
                            
                                how to add json library
                            
                                Ruby methods equivalent of "if a in list" in python?
                            
                                Python/Django: Which authorize.net library should I use?
                            
                                Bad file descriptor error
                            
                                Speeding up Django Testing
                            
                                How to set the precision on str(numpy.float64)?
                            
                                How do I find the frequency count of a word in English using WordNet?
                            
                                Create a new list from a list when a certain condition is met
                            
                                Python: repr vs backquote
                            
                                Is there a way to identify an inherited method in Python?
                            
                                Automatic detection of display availability with matplotlib
                            
                                Google Calendar API v3 - How to obtain a refresh token (Python)
                            
                                matplotlib chart - creating horizontal bar chart
                            
                                Insert static files literally into Jinja templates without parsing them
                            
                                Why map(print, a_list) doesn't work?
                            
                                How to check if given variable exist in jinja2 template?
                            
                                How to use Flask-Security register view?
                            
                                matplotlib hooking in to home/back/forward button events

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With