Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compute precision and recall in clustering?

Tags:

People also ask

How are precision and recall calculated?

For example, a perfect precision and recall score would result in a perfect F-Measure score: F-Measure = (2 * Precision * Recall) / (Precision + Recall) F-Measure = (2 * 1.0 * 1.0) / (1.0 + 1.0) F-Measure = (2 * 1.0) / 2.0.

How is cluster purity calculated?

Purity is quite simple to calculate. We assign a label to each cluster based on the most frequent class in it. Then the purity becomes the number of correctly matched class and cluster labels divided by the number of total data points. Each cluster is assigned with the most frequent class label.

What is F-measure in clustering?

F-MEASURE RANKING INDEX We used the F-measure to evaluate the accuracy of the clustering algorithms. The F-measure is a combination of precision and recall values are applied in information retrieval.

What is model precision and recall?

Precision and recall are two extremely important model evaluation metrics. While precision refers to the percentage of your results which are relevant, recall refers to the percentage of total relevant results correctly classified by your algorithm.


I am really confused how to compute precision and recall in clustering applications.

I have the following situation:

Given two sets A and B. By using a unique key for each element I can determine which of the elements of A and B match. I want to cluster those elements based on features (not using the unique key of course).

I am doing the clustering but I am not sure how to compute precision and recall. The formulas,according to the paper "Extended Performance Graphs for Cluster Retrieval" (http://staff.science.uva.nl/~nicu/publications/CVPR01_nies.pdf) are:

p = precision = relevant retrieved items/retrieved items and r = recall = relevant retrieved items/relevant items

I really do not get what elements fall under which category.

What I did so far is, I checked within the clusters how many matching pairs I have (using the unique key). Is that already one of precision or recall? And if so, which one is it and how can I compute the other one?

Update: I just found another paper with the title "An F-Measure for Evaluation of Unsupervised Clustering with Non-Determined Number of Clusters" at http://mtg.upf.edu/files/publications/unsuperf.pdf.