I want to run some experiments on semi-supervised (constrained) clustering, in particular with background knowledge provided as instance level pairwise constraints (Must-Link or Cannot-Link constraints). I would like to know if there are any good open-source packages that implement semi-supervised clustering? I tried to look at PyBrain, mlpy, scikit and orange, and I couldn't find any constrained clustering algorithms. In particular, I'm interested in constrained K-Means or constrained density based clustering algorithms (like C-DBSCAN). Packages in Matlab, Python, Java or C++ would be preferred, but need not be limited to these languages.
K-means clustering algorithm K-means clustering is the most commonly used clustering algorithm. It's a centroid-based algorithm and the simplest unsupervised learning algorithm. This algorithm tries to minimize the variance of data points within a cluster.
There are two different types of clustering, which are hierarchical and non-hierarchical methods.
The python package scikit-learn has now algorithms for Ward hierarchical clustering (since 0.15) and agglomerative clustering (since 0.14) that support connectivity constraints.
Besides, I do have a real world application, namely the identification of tracks from cell positions, where each track can only contain one position from each time point.
The R package conclust implements a number of algorithms:
There are 4 main functions in this package: ckmeans(), lcvqe(), mpckm() and ccls(). They take an unlabeled dataset and two lists of must-link and cannot-link constraints as input and produce a clustering as output.
There's also an implementation of COP-KMeans in python.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With