I'm unable to understand the input format of sklearn nDcg: http://sklearn.apachecn.org/en/0.19.0/modules/generated/sklearn.metrics.ndcg_score.html
Currently I have the following problem: I have multiple queries for each of which the ranking probabilities have been calculated successfully. But now the problem is calculating nDCG for the test set for which I would like to use the sklearn nDcg. The example given on the link
>>> y_true = [1, 0, 2]
>>> y_score = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
>>> ndcg_score(y_true, y_score, k=2)
1.0
According to site, y_true is ground truth and y_score are the probabilities.So following are my questions:
You can look at it similar to a multiclass classification problem.
So to answer your question
- Is this example for just one query or multiple queries?
One query
- If this is for just one query then what does y_true represents: original rankings?
I would refer to it as the relevancy label for the documents as it may have duplicate values.
- If this is for a single query and why we have multiple input probabilites?
y_score
is the probability distribution of the document belonging to a certain class. In your example y_score = [[0.15, 0.55, 0.2], [0.7, 0.2, 0.1], [0.06, 0.04, 0.9]]
means the 0th document belongs to class 1 (0.55 is the max), the 1st document belongs to class 0 (0.7 is the max) and the 2nd document belongs to class 2 (0.9 is the max). The documentation is lacking and the example is misleading as well. It would be better if there were four documents.
- How this method can be applied to multiple queries and their resultant probabilites?
You can then average the nDCG scores for each query across multiple queries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With