Lets say I have a list:
L = [15,16,57,59,14]
The list contains mesurements, that are not very accurate: that is the real value of an element is +-2 of the recorded value. So 14,15 and 16 can have the same value. What I want to do is to uniquefy that list, taking into account the mesurement errors. The output should therefor be:
l_out = [15,57]
or
l_out = [(14,15,16),(57,59)]
I have no problem producing either result with a for loop. However, I am curious if there could be a more elegant solution. Ideas much appriciated.
As lazyr pointed out in the comments, a similar problem has been posted here. Using the cluster module the solution to my problem would be:
>>> from cluster import *
>>> L = [15,16,57,59,14]
>>> cl = HierarchicalClustering(L, lambda x,y: abs(x-y))
>>> cl.getlevel(2)
[[14, 15, 16], [57, 59]]
or (to get unique list with mean values of each group):
>>> [mean(cluster) for cluster in cl.getlevel(2)]
[15, 58]
If you want standard lib python, itertool
's groupby
is your friend:
from itertools import groupby
L = [15,16,57,59,14]
# Stash state outside key function. (a little hacky).
# Better way would be to create stateful class with a __call__ key fn.
state = {'group': 0, 'prev': None}
thresh = 2
def _group(cur):
"""Group if within threshold."""
if state["prev"] is not None and abs(state["prev"] - cur) > thresh:
state["group"] += 1 # Advance group
state["prev"] = cur
return state["group"]
# Group, then drop the group key and inflate the final tuples.
l_out = [tuple(g) for _, g in groupby(sorted(L), key=_group)]
print l_out
# -> [(14, 15, 16), (57, 59)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With