Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

inputs for nDCG in sklearn

I'm unable to understand the input format of sklearn nDcg: http://sklearn.apachecn.org/en/0.19.0/modules/generated/sklearn.metrics.ndcg_score.html

Currently I have the following problem: I have multiple queries for each of which the ranking probabilities have been calculated successfully. But now the problem is calculating nDCG for the test set for which I would like to use the sklearn nDcg. The example given on the link

>>> y_true = [1, 0, 2]
>>> y_score = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
>>> ndcg_score(y_true, y_score, k=2)
1.0

According to site, y_true is ground truth and y_score are the probabilities.So following are my questions:

  1. Is this example for just one query or multiple queries?
  2. If this is for just one query then what does y_true represents: original rankings?
  3. If this is for a single query and why we have multiple input probabilites?
  4. How this method can be applied to multiple queries and their resultant probabilites?
like image 927
Yank Leo Avatar asked Apr 23 '18 20:04

Yank Leo


1 Answers

You can look at it similar to a multiclass classification problem.

So to answer your question

  1. Is this example for just one query or multiple queries?

One query

  1. If this is for just one query then what does y_true represents: original rankings?

I would refer to it as the relevancy label for the documents as it may have duplicate values.

  1. If this is for a single query and why we have multiple input probabilites?

y_score is the probability distribution of the document belonging to a certain class. In your example y_score = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]] means the 0th document belongs to class 1 (0.55 is the max), the 1st document belongs to class 0 (0.7 is the max) and the 2nd document belongs to class 2 (0.9 is the max). The documentation is lacking and the example is misleading as well. It would be better if there were four documents.

  1. How this method can be applied to multiple queries and their resultant probabilites?

You can then average the nDCG scores for each query across multiple queries.

like image 91
cindyxiaoxiaoli Avatar answered Nov 15 '22 07:11

cindyxiaoxiaoli