I used tf/idf to calculate consine similarity between two documents. It has some limitation and does not perform very well.
I looked for LDA (latent dirichlet allocation) to calculate document similarity. I don't know much about this. I couldn't find much stuff too about my problem.
Can you please provide me any tutorial related to my problem? Or can you give some advices how can i achive this task with LDA???
Thanks
P.S: also is there any source code availabe to perform such task with LDA??
Have you had a look at Lucene and Mahout?
This might be useful - Latent Dirichlet Allocation with Lucene and Mahout.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With