I have two methods that rank a list of strings differently, and what we can consider to be the "right" ranking of the list (i.e. a gold standard).
In other words:
ranked_list_of_strings_1 = method_1(list_of_strings)
ranked_list_of_strings_2 = method_2(list_of_strings)
correctly_ranked_list_of_strings # Some permutation of list_of_strings
How can I determine which method is better considering that method_1
and method_2
are black boxes? Are there any methods to measure this available either in SciPy
or scikit-learn
or similar libraries?
In my specific case, I actually have a dataframe, and each method outputs a score. What matters is not the difference in score between the methods and the true scores, but that the methods get the ranking right (higher score means higher ranking for all columns).
strings scores_method_1 scores_method_2 true_scores
5714 aeSeOg 0.54 0.1 0.8
5741 NQXACs 0.15 0.3 0.4
5768 zsFZQi 0.57 0.7 0.2
Although defined in terms of matrices, the rank distance is equal to the minimum total weight of a series of weighted operations that leads from one genome to the other, including inversions, translocations, transpositions, and others.
In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different ...
Kendall's matrix is symmetric: Copy to clipboard.
To calculate the Kendall tau-b for the given data set, you can use the formula in the Wikipedia page. I count n0=10, n1=2, n2=1, nc=2, nd=6, so that τB=2−6√(10−2)(10−1)=−4√72=−. 4714045.
The scikit-learn library also seems to have a NDCG (and DCG) metric implemented now.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ndcg_score.html#sklearn.metrics.ndcg_score
You're looking for Normalized Discounted Cumulative Gain (NDGC). It's a metric commonly used in search engine rankings to test the quality of the result ranking.
The idea is that you test your ranking (in your case the two methods) against user feedback through clicks (in your cast the true rank). NDGC will tell you the quality of your ranking relative to the truth.
Python has RankEval based module that implements this metric (and some others if you want to try them). The repo is here and there is a nice IPython NB with examples
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With