Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python KMeans clustering words

I am interested to perform kmeans clustering on a list of words with the distance measure being Leveshtein.

1) I know there are a lot of frameworks out there, including scipy and orange that has a kmeans implementation. However they all require some sort of vector as the data which doesn't really fit me.

2) I need a good clustering implementation. I looked at python-clustering and realize that it doesn't a) return the sum of all the distance to each centroid, and b) it doesn't have any sort of iteration limit or cut off which ensures the quality of the clustering. python-clustering and the clustering algorithm on daniweb doesn't really work for me.

Can someone find me a good lib? Google hasn't been my friend

like image 704
sadawd Avatar asked Mar 17 '10 03:03

sadawd


1 Answers

Yeah I think there isn't a good implementation to what I need.

I have some crazy requirements, like distance caching etc.

So i think i will just write my own lib and release it as GPLv3 soon.

like image 198
sadawd Avatar answered Oct 14 '22 11:10

sadawd