Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Naive Bayes vs. SVM for classifying text data

I'm working on a problem that involves classifying a large database of texts. The texts are very short (think 3-8 words each) and there are 10-12 categories into which I wish to sort them. For the features, I'm simply using the tf–idf frequency of each word. Thus, the number of features is roughly equal to the number of words that appear overall in the texts (I'm removing stop words and some others).

In trying to come up with a model to use, I've had the following two ideas:

  • Naive Bayes (likely the sklearn multinomial Naive Bayes implementation)
  • Support vector machine (with stochastic gradient descent used in training, also an sklearn implementation)

I have built both models, and am currently comparing the results.

What are the theoretical pros and cons to each model? Why might one of these be better for this type of problem? I'm new to machine learning, so what I'd like to understand is why one might do better.

Many thanks!

like image 477
Ryan Avatar asked Feb 12 '16 10:02

Ryan


People also ask

Is Naive Bayes good for text classification?

The way the different types of Naive Bayesian classifiers have been designed they work very well on all kinds of text related problems. Document classification is one such example of a text classification problem which can be solved by using both Multinomial and Bernoulli Naive Bayes.

Is SVM good for text classification?

It pro- vides both theoretical and empirical evidence that SVMs are very well suited for text categorization.

Why Naive Bayes is used in text analysis?

Naive Bayes is the simplest and fastest classification algorithm for a large chunk of data. In various applications such as spam filtering, text classification, sentiment analysis, and recommendation systems, Naive Bayes classifier is used successfully.

Which type of Naive Bayes model can be used for text classification?

The Multinomial Naive Bayes can be accepted as the probabilistic approach to classifying documents in the case of acknowledging the frequency of a specified word in a text document.


1 Answers

The biggest difference between the models you're building from a "features" point of view is that Naive Bayes treats them as independent, whereas SVM looks at the interactions between them to a certain degree, as long as you're using a non-linear kernel (Gaussian, rbf, poly etc.). So if you have interactions, and, given your problem, you most likely do, an SVM will be better at capturing those, hence better at the classification task you want.

The consensus for ML researchers and practitioners is that in almost all cases, the SVM is better than the Naive Bayes.

From a theoretical point of view, it is a little bit hard to compare the two methods. One is probabilistic in nature, while the second one is geometric. However, it's quite easy to come up with a function where one has dependencies between variables which are not captured by Naive Bayes (y(a,b) = ab), so we know it isn't an universal approximator. SVMs with the proper choice of Kernel are (as are 2/3 layer neural networks) though, so from that point of view, the theory matches the practice.

But in the end it comes down to performance on your problem - you basically want to choose the simplest method which will give good enough results for your problem and have a good enough performance. Spam detection has been famously solvable by just Naive Bayes, for example. Face recognition in images by a similar method enhanced with boosting etc.

like image 196
Horia Coman Avatar answered Oct 03 '22 21:10

Horia Coman