I have implemented several clustering algorithms on an image dataset. I'm interested in deriving the success rate of clustering. I have to detect the tumor area, in the original image I know where the tumor is located, I would like to compare the two images and obtain the percentage of success. Following images:
Original image: I know the position of cancer
Image after clustering algorithm
I'm using python 2.7.
Clustering Performance Evaluation Metrics Here clusters are evaluated based on some similarity or dissimilarity measure such as the distance between cluster points. If the clustering algorithm separates dissimilar observations apart and similar observations together, then it has performed well.
The C-H Index is a great way to evaluate the performance of a Clustering algorithm as it does not require information on the ground truth labels. The higher the Index, the better the performance.
You can evaluate the performance of k-means by convergence rate and by the sum of squared error(SSE), making the comparison among SSE. It is similar to sums of inertia moments of clusters.
Computing accuracy for clustering can be done by reordering the rows (or columns) of the confusion matrix so that the sum of the diagonal values is maximal. The linear assignment problem can be solved in O(n3) instead of O(n!). Coclust library provides an implementation of the accuracy for clustering results.
Segmentation Accuracy
This is a pretty common problem addressed in image segmentation literature, e.g., here is a StackOverflow post
One common approach is to consider the ratio of "correct pixels" to "incorrect pixels," which is common in image segmentation for safety domain, e.g., Mask RCNN, PixelNet.
Treating it as more of an object detection task, you could take the overlap of the hull of the objects and just measure accuracy (commonly broken down into precision, recall, f-score, and other measures with various bias/skews). This allows you to produce an ROC curve that can be calibrated for false positives/false negatives.
There is no domain-agnostic consensus on what's correct. KITTI provides both.
Mask RCNN is open source state-of-the-art, and provides implemenations in python of
In your domain (medicine), standard statistical rules apply. Use a holdout set. Cross validate. Etc. (*)
Note: although the literature space is dauntingly large, I'd caution you to take a look at some domain-relevant papers, as they may take fewer "statistical short cuts" than other vision (digit recognition e.g.) projects accept.
Python
Besides the mask rcnn links above, scikit-learn provides some extremely user friendly tools and is considered part of the standard science "stack" for python.
Implementing the difference between images in python is trivial (using numpy). Here's an overkill SO link.
Bounding box intersection in python is easy to implement on one's own; I'd use a library like shapely if you want to measure general polygon intersection.
Scikit-learn has some nice machine-learning evaluation tools, for example,
Literature Searching
One reason that you may have trouble searching for the answer is because you're trying to measure performance of an unsupervised method, clustering, in a supervised learning arena. "Clusters" are fundamentally under-defined in mathematics (**). You want to be looking at the supervised learning literature for accuracy measures.
There is literature on unsupervised learning/clustering, too, which looks for topological structure, generally. Here's a very introductory summary. I don't think that is what you want.
A common problem, especially at scale, is that supervised methods require labels, which can be time consuming to produce accurately for dense segmentation. Object detection makes it a little easier.
There are some existing datasets for medicine ([1], [2], e.g.) and some ongoing research in label-less metrics. If none of these are options for you, then you may have to revert to considering it an unsupervised problem, but evaluation becomes very different in scope and utility.
Footnotes
[*] Vision people sometimes skip cross validation even though they shouldn't, mainly because the models are slow to fit and they're a lazy bunch. Please don't skip a train/test/validation split, or your results may be dangerously useless
[**] You can find all sorts of "formal" definitions, but never two people to agree on which one is correct or most useful. Here's denser reading
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With