Given a query I have a cosine score for a document. I also have the documents pagerank. Is there a standard good way of combining the two?
I was thinking of multiply them
Total_Score = cosine-score * pagerank
Because if you get to low on either pagerank or the cosine-score, the document is not interesting.
Or is it preferable to have a weighted sum?
Total_Score = weight1 * cosine-score + weight2 * pagerank
Is this better? Then you might have zero cosine score, but a high pagerank, and the page will show up among the results.
The weighted sum is probably better as a ranking rule.
It helps to break the problem up into a retrieval/ filtering step and a ranking step. The problem outlined with the weighted sum approach then no longer holds.
The process outlined in this paper by Sergey Brin and Lawrence Page uses a variant of the vector/ cosine model for retrieval and it seems some kind of weighted sum for the ranking where the weights are determined by user activity (see section 4.5.1). Using this approach a document with zero cosine would not get pass the retrieval/ filtering step and thus would not be considered for ranking.
You could consider using a harmonic mean. With a harmonic mean the the 2 scores will essentially be averaged however, low scores will drag the average down more than they would in a regular average.
You could use:
Total_Score = 2*(cosine-score * pagerank) / (cosine-score + pagerank)
Let's say pagerank scored 0.1 and cosine 0.9, the normal average of these two number would be: (0.1 + 0.9)/2 = 0.5
, the harmonic mean would be: 2*(0.9*0.1)/(0.9 + 0.1) = 0.18
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With