Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to show that NDCG score is significant

Suppose the NDCG score for my retrieval system is .8. How do I interpret this score. How do i tell the reader that this score is significant?

like image 510
Programmer Avatar asked Feb 27 '12 16:02

Programmer


2 Answers

To understand this lets check an example of Normalized Discounted Cumulative Gain (nDCG)
For nDCG we need DCG and Ideal DCG (IDCG)
Lets understand what is Cumulative Gain (CG) first,

Example: Suppose we have [Doc_1, Doc_2, Doc_3, Doc_4, Doc_5]
Doc_1 is 100% relevant
Doc_2 is 70% relevant
Doc_3 is 95% relevant
Doc_4 is 20% relevant
Doc_5 is 100% relevant

So our Cumulative Gain (CG) is

CG = 100 + 70 + 95 + 20 + 100  ###(Index of the doc doesn't matter)
   = 385

and
Discounted cumulative gain (DCG) is

DCG = SUM( relivencyAt(index) / log2(index + 1) ) ###where index 1 -> 5

Doc_1 is 100 / log2(2) = 100.00
Doc_2 is 70  / log2(3) = 044.17
Doc_3 is 95  / log2(4) = 047.50
Doc_4 is 20  / log2(5) = 008.61
Doc_5 is 100 / log2(6) = 038.69

DCG = 100 + 44.17 + 47.5 + 8.61 + 38.69
DCG = 238.97

and Ideal DCG is

IDCG = Doc_1 , Doc_5, Doc_3, Doc_2, Doc_4

Doc_1 is 100 / log2(2) = 100.00
Doc_5 is 100 / log2(3) = 063.09
Doc_3 is 95  / log2(4) = 047.50
Doc_2 is 75  / log2(5) = 032.30
Doc_4 is 20  / log2(6) = 007.74

IDCG = 100 + 63.09 + 47.5 + 32.30 + 7.74
IDCG = 250.63

nDCG(5) = DCG    / IDCG
        = 238.97 / 250.63
        = 0.95

Conclusion:

In the given example nDCG was 0.95, 0.95 is not prediction accuracy, 0.95 is the ranking of the document effective. So, the gain is accumulated from the top of the result list to the bottom, with the gain of each result discounted at lower ranks.
Wiki reference

like image 94
Wazy Avatar answered Oct 06 '22 23:10

Wazy


The NDCG is a ranking metric. In the information retrieval field you should predict a sorted list of documents and them compare it with a list of relevant documents. Imagine that you predicted a sorted list of 1000 documents and there are 100 relevant documents, the NDCG equals 1 is reached when the 100 relevant docs have the 100 highest ranks in the list.

So .8 NDCG is 80% of the best ranking.

This is an intuitive explanation the real math includes some logarithms, but it is not so far from this.

like image 45
Augusto Avatar answered Oct 06 '22 22:10

Augusto