I am currently learning Information retrieval and i am rather stuck with an example of recall and precision A searcher uses a search engine to look for information. There are 10 documents on the first screen of results and 10 on the second. Assuming there is known to be 10 relevant documents in the search engines index. Soo... there is 20 searches all together of which 10 are relevant. Can anyone help me make sense of this? Thanks

Recall and precision measure the quality of your result. To understand them let's first define the types of results. A document in your returned list can either be <ul> <li> classified correctly <ul> <li>a true positive (TP): a document which is relevant (positive) that was indeed returned (true) </li> <li>a true negative (TN): a document which is not relevant (negative) that was indeed NOT returned (true)</li> </ul> </li> <li> misclassified <ul> <li>a false positive (FP): a document which is not relevant but was returned positive</li> <li>a false negative (FN): a document which is relevant but was not returned negative</li> </ul> </li> </ul> the precision is then: |TP| / (|TP| + |FP|) i.e. the fraction of retrieved documents which are indeed relevant the recall is then: |TP| / (|TP| + |FN|) i.e. the fraction of relevant documents which are in your result set So, in your example 10 out of 20 results are relevant. This gives you a precision of 0.5. If there are no more than these 10 relevant documents, you have got a recall of 1. (When measuring the performance of an Information Retrieval system it only makes sense to consider both precision and recall. You can easily get a precision of 100% by returning no result at all (i.e. no spurious returned instance => no FP) or a recall of 100% by returning every instance (i.e. no relevant document was missed => no FN). )

Understanding Recall and Precision

1 Answers

Recall and precision measure the quality of your result. To understand them let's first define the types of results. A document in your returned list can either be

classified correctly
- a true positive (TP): a document which is relevant (positive) that was indeed returned (true)
- a true negative (TN): a document which is not relevant (negative) that was indeed NOT returned (true)
misclassified
- a false positive (FP): a document which is not relevant but was returned positive
- a false negative (FN): a document which is relevant but was not returned negative

the precision is then:

|TP| / (|TP| + |FP|)

i.e. the fraction of retrieved documents which are indeed relevant

the recall is then:

|TP| / (|TP| + |FN|)

i.e. the fraction of relevant documents which are in your result set

So, in your example 10 out of 20 results are relevant. This gives you a precision of 0.5. If there are no more than these 10 relevant documents, you have got a recall of 1.

(When measuring the performance of an Information Retrieval system it only makes sense to consider both precision and recall. You can easily get a precision of 100% by returning no result at all (i.e. no spurious returned instance => no FP) or a recall of 100% by returning every instance (i.e. no relevant document was missed => no FN). )

answered Sep 30 '22 17:09

spike

Related questions
                            
                                How to create a basic semantic search in python
                            
                                Lucene scoring: in what context is queryNorm used?
                            
                                HTML: How to get sub-links and search box display upon google search [duplicate]
                            
                                search for custom header value in notmuch
                            
                                CommonCrawl: How to find a specific web page?
                            
                                How do you build a torrent file indexer?
                            
                                Can i use Swedish characters like ö or å in a url?
                            
                                is there any link to show all public repositories in GitHub?
                            
                                Problems with the library IFilter
                            
                                SOLR search filter by relevancy score
                            
                                Where can I find a corpus of search engine queries?
                            
                                Solr multilingual search
                            
                                Retrieve 404 status when route is not found. Angular 6+ and Universal
                            
                                lightweight search engine for asp.net
                            
                                Search engine for .net
                            
                                expressjs node.js serve different data to google/etc bot and human traffic

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding Recall and Precision

Tags:

search-engine

information-retrieval

precision-recall

Bob Marks

People also ask

1 Answers

spike

Recent Activity

Donate For Us