I'm struggling with that, since I'm sure that a dozen for-loops is not the solution for this problem:
There is a sorted list of numbers like
numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]
and I want to create a dict with lists of numbers, wherein the difference of the numbers (following each other) is not more than 15. So the output would be this:
clusters = {
1 : [123, 124, 128],
2 : [160, 167],
3 : [213, 215, 230, 245, 255, 257],
4 : [400, 401, 402],
5 : [430]
}
My current solution is a bit ugly (I have to remove duplicates at the end…), I'm sure it can be done in a pythonic way.
This is what I do now:
clusters = {}
dIndex = 0
for i in range(len(numbers)-1) :
if numbers[i+1] - numbers[i] <= 15 :
if not clusters.has_key(dIndex) : clusters[dIndex] = []
clusters[dIndex].append(numbers[i])
clusters[dIndex].append(numbers[i+1])
else : dIndex += 1
1) To know the number of clusters you have to define a threshold that will tell the algorithm that how much two tuples must differ before they can be considered as belonging to two different groups. For example, consider these two groups of coins: 5 cents, and 2 cents, such that each of them has a different weight.
Not strictly necessary if your list is small, but I'd probably approach this in a "stream-processing" fashion: define a generator that takes your input iterable, and yields the elements grouped into runs of numbers differing by <= 15. Then you can use that to generate your dictionary easily.
def grouper(iterable):
prev = None
group = []
for item in iterable:
if prev is None or item - prev <= 15:
group.append(item)
else:
yield group
group = [item]
prev = item
if group:
yield group
numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]
dict(enumerate(grouper(numbers), 1))
prints:
{1: [123, 124, 128],
2: [160, 167],
3: [213, 215, 230, 245, 255, 257],
4: [400, 401, 402],
5: [430]}
As a bonus, this lets you even group your runs for potentially-infinite lists (as long as they're sorted, of course). You could also stick the index generation part into the generator itself (instead of using enumerate
) as a minor enhancement.
import itertools
import numpy as np
numbers = np.array([123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430])
nd = [0] + list(np.where(np.diff(numbers) > 15)[0] + 1) + [len(numbers)]
a, b = itertools.tee(nd)
next(b, None)
res = {}
for j, (f, b) in enumerate(itertools.izip(a, b)):
res[j] = numbers[f:b]
If you can use itertools and numpy. Adapted pairwise
for the iterator tricks. The +1
is needed to shift the index, adding the 0
and len(numbers)
onto the list makes sure the first and last entries are included correctly.
You can obviously do this with out itertools
, but I like tee
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With