Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Recall and Precision

I am currently learning Information retrieval and i am rather stuck with an example of recall and precision

A searcher uses a search engine to look for information. There are 10 documents on the first screen of results and 10 on the second.

Assuming there is known to be 10 relevant documents in the search engines index.

Soo... there is 20 searches all together of which 10 are relevant.

Can anyone help me make sense of this?

Thanks

like image 453
Bob Marks Avatar asked Jan 28 '14 18:01

Bob Marks


People also ask

Which should be higher precision or recall?

Recall is a better measure than precision. For YouTube recommendations, false-negatives is less of a concern.

Should precision and recall be high or low?

Models need high recall when you need output-sensitive predictions. For example, predicting cancer or predicting terrorists needs a high recall, in other words, you need to cover false negatives as well. It is ok if a non-cancer tumor is flagged as cancerous but a cancerous tumor should not be labeled non-cancerous.


1 Answers

Recall and precision measure the quality of your result. To understand them let's first define the types of results. A document in your returned list can either be

  • classified correctly

    • a true positive (TP): a document which is relevant (positive) that was indeed returned (true)
    • a true negative (TN): a document which is not relevant (negative) that was indeed NOT returned (true)
  • misclassified

    • a false positive (FP): a document which is not relevant but was returned positive
    • a false negative (FN): a document which is relevant but was not returned negative

the precision is then:

|TP| / (|TP| + |FP|)

i.e. the fraction of retrieved documents which are indeed relevant

the recall is then:

|TP| / (|TP| + |FN|)

i.e. the fraction of relevant documents which are in your result set

So, in your example 10 out of 20 results are relevant. This gives you a precision of 0.5. If there are no more than these 10 relevant documents, you have got a recall of 1.

(When measuring the performance of an Information Retrieval system it only makes sense to consider both precision and recall. You can easily get a precision of 100% by returning no result at all (i.e. no spurious returned instance => no FP) or a recall of 100% by returning every instance (i.e. no relevant document was missed => no FN). )

like image 65
spike Avatar answered Sep 30 '22 17:09

spike