I have an array of floating-point numbers, which is unordered. I know that the values always fall around a few points, which are not known. For illustration, this list
[10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
has values clustered around 5 and 10, so I would like [5,10] as answer.
I would like to find those clusters for lists with 1000+ values, where the nunber of clusters is probably around 10 (for some given tolerance). How to do that efficiently?
Check python-cluster. With this library you could do something like this :
from cluster import *
data = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
print [mean(cluster) for cluster in cl.getlevel(1.0)]
And you would get:
[5.0062, 10.003333333333332]
(This is a very silly example, because I don't really know what you want to do, and because this is the first time I've used this library)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With