I need to provide a weighted sort on 2+ factors, ordered by "relevancy". However, the factors aren't completely isolated, in that I want one or more of the factors to affect the "urgency" (weight) of the others.
Example: contributed content (articles) can be up-/down-voted, and thus have a rating; they have a post date, and they're also tagged with categories. Users write the articles and can vote, and may or may not have some kind of ranking themselves (expert, etc). Probably similar to StackOverflow, right?
I want to provide each user with a list of articles grouped by tag but sorted by "relevancy", where relevancy is calculated based on the rating and age of the article, and possibly affected by the ranking of the author. I.E. a highly ranked article that was written several years ago may not necessarily be as relevant as a medium ranked article written yesterday. And maybe if an article was written by an expert it would be treated as more relevant than one written by "Joe Schmoe".
Another good example would be assigning hotels a "meta score" comprised of price, rating, and attractions.
My question is, what is the best algorithm for multiple factor sorting? This may be a duplicate of that question, but I'm interested in a generic algorithm for any number of factors (a more reasonable expectation is 2 - 4 factors), preferably a "fully-automatic" function that I don't have to tweak or require user input, and I can't parse linear algebra and eigenvector wackiness.
Possibilities I've found so far:
Note: S
is the "sorting score"
S = (w1 * F1) + (w2 * F2) + (w3 * F3)
, where wx
are arbitrarily assigned weights, and Fx
are the values of the factors. You'd also want to normalize F
(i.e. Fx_n = Fx / Fmax
). I think this is kinda how Lucene search works.S = 1000 * F1 + 100 * F2 + 10 * F3 ...
.S = (F2 / F2_max * F1) + ((1 - (F2 / F2_max)) * F1_avg)
, where F1
is the "more important" factor ("bounce rate" in the article), and F2
is the "significance modifying" factor ("visits" in the article).S = (F2 / (F2+F2_lim)) * F1 + (F2_lim / (F2+F2_lim)) × F1_avg
, where Fx
are the same as #3, and F2_lim
is the minimum threshold limit for the "significance" factor (i.e. any value less than X shouldn't be considered).Options #3 or #4 look really promising, since you don't really have to choose an arbitrary weighting scheme like you do in #1 and #2, but the problem is how do you do this for more than two factors?
I also came across the SQL implementation for a two-factor weighting algorithm, which is basically what I'll need to write eventually.
As mentioned in the comments, I would suggest what's called the 'compromise solution' to anyone with a similar problem who is more concerned with not having to set weights than with making one criterion more heavily weighted than the others.
Basically, you consider each of your criterion as a coordinate (after normalization, of course). Based on your judgement, you choose the absolute optimal point, e.g. in this case, the highest rank author, the newest article, etc. Once you choose the optimal solution, each other 'solution' is rated based on its distance from that optimal. A sample formula would be the inverse of the Euclidean distance for each article's score: S = 1/(sqrt((rank - rank_ideal)^2 + (age - age_ideal)^2 + ... + (xn - xn_ideal)^2)).
This treats all criteria as equal, so keep that in mind.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With