Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differentiate between tf-idf document similarity and naive Bayes classifier

How do I choose between tf-idf document similarity and naive Bayes classifier. I don't understand which one to use, is there any method to identify which algorithm is good for what purpose?

like image 406
madan ram Avatar asked Dec 21 '25 12:12

madan ram


1 Answers

You don't.

Term Frequency Inverse Document Frequency is a method of assigning numeric values to features. It is (mostly) independent of the method use to classify the data points.

I assume by similarity you mean cosine similarity & nearest neighbor classification.

Provided you are doing classification, you would choose whichever method seems to give you the best accuracy (or best meet your requirements). In the presence of very large data sets, computing the cosine similarity to each document in your data set will become prohibitive.

If you meant cosine similarity to rank results (find a document similar to Q), then there is no "choice". That is a ranking task, naive bayes is for classification.

In real life, both methods are not particularly good. You would only use them to get an initial idea of how hard / easy a task might be by throwing the dumb & simple methods at it. If one "dumb" method performed significantly better than the others, you might consider trying more advanced models that are related to the best dumb method.

like image 156
Raff.Edward Avatar answered Dec 24 '25 11:12

Raff.Edward



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!