rank-based recommendation system use NDCG to evaluate Recommendation accuracy. However, sometimes Accuracy rate and recall rate are used to evaluate top-n recommendation. Does it mean when NDCG is high, accuracy rate is high? But I run a ListRankMF algorithm, the accuracy rate is very low on movelens 100k dataset, just about 8%. What's the relation between NDCG and accuracy rate?
The full form of nDCG is “Normalised Discounted Cumulative Gain” which is a measure by which we can measure the ranking quality. This metric was developed to evaluate a recommendation system and is compatible with Python DataFrame.
NDCG is a measure of ranking quality. In Information Retrieval, such measures assess the document retrieval algorithms. In this article, we will cover the following: Justification for using a measure for ranking quality to evaluate a recommendation engine.
en.wikipedia.org/wiki/Discounted_cumulative_gain nDCG is there so that the values fall between 0 and 1 and has "natural" interpretation. If so, the score of 1 means that the order of hits in a search is perfectly ordered by relevance while 0 is the opposite. 0.5 means half the hits are ordered ok.
NDCG: Normalized Discounted Cumulative Gain.
NDCG is most helpful when the objective of the recommender system is to return some relevant results, and order is important. For example, recommending a translation, or recommending a bank account. It's not harmful if we miss relevant results, but for a good user experience we want them in a meaningful order.
Recall is most helpful when the objective of the recommender system is to return all relevant results, and order is unimportant. For example, a potential medical diagnosis or prescription. It is harmful if we miss a relevant results, since that might be the correct diagnosis or cure. The order is not important since we expect the medic to read through all the possibilities and use their expert knowledge for the final decision.
Suppose there are 5 drugs we could recommend a doctor to give a patient (A to E), and 5 that we should not recommend (F to J). Our recommender system outputs the recommendations A,B,C,D. This gives us the following evaluations:
In this case recall clearly shows we did not do as well as we could (since we did not recommend drug E), whereas NDCG is leads us to believe we made the perfect recommendations.
If we were instead recommending books, then NDCG would be more appropriate. Recall is not so informative since there may be hundreds of relevant books, but we cannot expect a user to read through a list of hundreds of books to pick just one to read. NDCG would tell us if we are at least recommending some meaningful subset of what is possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With