Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Measuring similarity between document sets

For illustration purposes, let's assume this is a forum service. I need to calculate the "similarity" among each users' posts, so that the result would be something like:

among posts by user A, similarity 60%
among posts by user B, similarity 20%
...

I'm dealing with multibyte strings, so I guess I'm stuck with search engines here. We already use Solr, already have moreLikeThis implemented, but I'm not quite sure how to construct the query. Any help appreciated!

like image 543
jodeci Avatar asked May 20 '11 09:05

jodeci


1 Answers

Possibly Carrot2 will interest you (and this blog related to it)

like image 157
Omnaest Avatar answered Oct 18 '22 19:10

Omnaest