Naive Bayes and Logistic Regression Error Rate

Tags:

machine-learning

I have been trying to figure out the correlation between the error rate and the number of features in both of these models. I watched some videos, and the creator of the video said that a simple model can be better than a complicated model. So I figured that the more features I had the greater the error rate would be. This did not prove to be true in my work, and when I had less features the error rate went up. I'm not sure if I'm doing this incorrectly, or if the guy in the video made a mistake. Can someone care to explain? I also am curious how features relate to Logistic Regression's error rate as well.

912

asked Oct 02 '13 02:10

Taztingo

1 Answers

Naive Bayes and Logistic Regression are a "generative-discriminative pair," meaning they have the same model form (a linear classifier), but they estimate parameters in different ways.

For feature x and label y, naive Bayes estimates a joint probability p(x,y) = p(y)*p(x|y) from the training data (that is, builds a model that could "generate" the data), and uses Bayes Rule to predict p(y|x) for new test instances. On the other hand, logistic regression estimates p(y|x) directly from the training data by minimizing an error function (which is more "discrimative").

These differences have implications for error rate:

When there are very few training instances, logistic regression might "overfit," because there isn't enough data to estimate p(y|x) reliably. Naive Bayes might do better because it models the entire joint distribution.
When the feature set is large (and sparse, like word features in text classification) naive Bayes might "double count" features that are correlated with each other, because it assumes that each p(x|y) event is independent, when they are not. Logistic regression can do a better job by naturally "splitting the difference" among these correlated features.

If the features really are (mostly) conditionally independent, both models might actually improve with more and more features, provided there are enough data instances. The problem comes when the training set size is small relative to the number of features. Priors on naive Bayes feature parameters, or regularization methods (like L1/Lasso or L2/Ridge) on logistic regression can help in these cases.

answered Sep 27 '22 18:09

burr

Related questions
                            
                                Parallel threading with xgboost?
                            
                                Compare multiple algorithms with sklearn pipeline
                            
                                How do I load a Weka model in Java?
                            
                                Understanding DictVectorizer in scikit-learn?
                            
                                TensorFlow Dataset.shuffle - large dataset [duplicate]
                            
                                LabelEncoder specify classes in DataFrame
                            
                                Correct way of normalizing and scaling the MNIST dataset
                            
                                Basic machine learning [closed]
                            
                                What svm python modules use gpu?
                            
                                what is f1-score and what its value indicates? [closed]
                            
                                How do I calculate the matthews correlation coefficient in tensorflow
                            
                                ValueError: No gradients provided for any variable - Tensorflow 2.0/Keras
                            
                                Linear Regression with quadratic terms
                            
                                ValueError: Dimensions must be equal, but are 1 and 3 for 'Conv2D' (op: 'Conv2D') with input shapes: [1,400,400,1], [1,3,3,1]
                            
                                How to perform SMOTE with cross validation in sklearn in python
                            
                                PyTorch DataLoader - "IndexError: too many indices for tensor of dimension 0"
                            
                                Primer on TensorFlow and Keras: The past (TF1) the present (TF2)
                            
                                Best Performance-Critical Algorithm for Solving Nearest Neighbor
                            
                                ImportError: No module named 'sklearn.lda'
                            
                                f1_score metric in lightgbm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With