Differentiate between tf-idf document similarity and naive Bayes classifier

Question

How do I choose between tf-idf document similarity and naive Bayes classifier. I don't understand which one to use, is there any method to identify which algorithm is good for what purpose?

Raff.Edward · Accepted Answer

You don't.

Term Frequency Inverse Document Frequency is a method of assigning numeric values to features. It is (mostly) independent of the method use to classify the data points.

I assume by similarity you mean cosine similarity & nearest neighbor classification.

Provided you are doing classification, you would choose whichever method seems to give you the best accuracy (or best meet your requirements). In the presence of very large data sets, computing the cosine similarity to each document in your data set will become prohibitive.

If you meant cosine similarity to rank results (find a document similar to Q), then there is no "choice". That is a ranking task, naive bayes is for classification.

In real life, both methods are not particularly good. You would only use them to get an initial idea of how hard / easy a task might be by throwing the dumb & simple methods at it. If one "dumb" method performed significantly better than the others, you might consider trying more advanced models that are related to the best dumb method.

Differentiate between tf-idf document similarity and naive Bayes classifier

Tags:

machine-learning

madan ram

1 Answers

Raff.Edward

Recent Activity

Donate For Us

Differentiate between tf-idf document similarity and naive Bayes classifier

Tags:

machine-learning

madan ram

1 Answers

Raff.Edward

Related questions

Recent Activity

Donate For Us