How to find the success rate of a clustering algorithm?

Tags:

I have implemented several clustering algorithms on an image dataset. I'm interested in deriving the success rate of clustering. I have to detect the tumor area, in the original image I know where the tumor is located, I would like to compare the two images and obtain the percentage of success. Following images:

Original image: I know the position of cancer

Image after clustering algorithm

I'm using python 2.7.

417

asked Jul 25 '18 17:07

GuroTozzi

1 Answers

Segmentation Accuracy

This is a pretty common problem addressed in image segmentation literature, e.g., here is a StackOverflow post

One common approach is to consider the ratio of "correct pixels" to "incorrect pixels," which is common in image segmentation for safety domain, e.g., Mask RCNN, PixelNet.

Treating it as more of an object detection task, you could take the overlap of the hull of the objects and just measure accuracy (commonly broken down into precision, recall, f-score, and other measures with various bias/skews). This allows you to produce an ROC curve that can be calibrated for false positives/false negatives.

There is no domain-agnostic consensus on what's correct. KITTI provides both.

Mask RCNN is open source state-of-the-art, and provides implemenations in python of

Computing image matching between segmented and original
Displaying the differences

In your domain (medicine), standard statistical rules apply. Use a holdout set. Cross validate. Etc. (*)

Note: although the literature space is dauntingly large, I'd caution you to take a look at some domain-relevant papers, as they may take fewer "statistical short cuts" than other vision (digit recognition e.g.) projects accept.

"Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool" provides some summary methods in your your domain
"Current methods in image segmentation" has about 2500 citations but is a little older.
"Review of MR image segmentation techniques using pattern recognition" is a little older still and will get you safely into "traditional" vision models.
Automated Segmentation of MR Images of Brain Tumors is largely about its segmentation validation process

Python

Besides the mask rcnn links above, scikit-learn provides some extremely user friendly tools and is considered part of the standard science "stack" for python.

Implementing the difference between images in python is trivial (using numpy). Here's an overkill SO link.

Bounding box intersection in python is easy to implement on one's own; I'd use a library like shapely if you want to measure general polygon intersection.

Scikit-learn has some nice machine-learning evaluation tools, for example,

ROC curves
Cross validation
Model selection
A million others

Literature Searching

One reason that you may have trouble searching for the answer is because you're trying to measure performance of an unsupervised method, clustering, in a supervised learning arena. "Clusters" are fundamentally under-defined in mathematics (**). You want to be looking at the supervised learning literature for accuracy measures.

There is literature on unsupervised learning/clustering, too, which looks for topological structure, generally. Here's a very introductory summary. I don't think that is what you want.

A common problem, especially at scale, is that supervised methods require labels, which can be time consuming to produce accurately for dense segmentation. Object detection makes it a little easier.

There are some existing datasets for medicine ([1], [2], e.g.) and some ongoing research in label-less metrics. If none of these are options for you, then you may have to revert to considering it an unsupervised problem, but evaluation becomes very different in scope and utility.

Footnotes

[*] Vision people sometimes skip cross validation even though they shouldn't, mainly because the models are slow to fit and they're a lazy bunch. Please don't skip a train/test/validation split, or your results may be dangerously useless

[**] You can find all sorts of "formal" definitions, but never two people to agree on which one is correct or most useful. Here's denser reading

175

answered Oct 02 '22 22:10

en_Knight

Related questions
                            
                                TypeError: argument 1 must have a "write" method
                            
                                FIX protocol in Python - implement login and request for streaming quote
                            
                                Can Canny in OpenCV deal with both grayscale and color images?
                            
                                Algorithm for itertools.combinations in Python
                            
                                Using python type hints with numba
                            
                                How to test a Django on_commit hook without clearing the database?
                            
                                Jupyter Notebook timeout waiting for response in Chrome
                            
                                Summing rows in grouped pandas dataframe and return NaN
                            
                                Python matplotlib colorbar scientific notation base
                            
                                Django rest framework: Get detail view using a field other than primary key integer id
                            
                                Python Build Error: failed to build modules _ssl and _hashlib
                            
                                Python 2 and 3 're.sub' inconsistency
                            
                                Using tkinter to input into a variable, to be called
                            
                                Numpy: Replacing values in a 2D array efficiently using a dictionary as a map
                            
                                Can Tensorflow be used for global minimization of multivariate functions?
                            
                                Python comparison ignoring nan
                            
                                Style for continued multi-line f-strings [closed]
                            
                                python - abstract method in normal class
                            
                                Tensorflow Object Detection Slow when using rtsp stream
                            
                                Django conditional create

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find the success rate of a clustering algorithm?

Tags:

python

image-processing

cluster-analysis

analysis

GuroTozzi

People also ask

1 Answers

en_Knight

Recent Activity

Donate For Us