What is a weak learner?

Tags:

I want to compare different error rates of different classifiers with the error rate from a weak learner (better than random guessing). So, my question is, what are a few choices for a simple, easy to process weak learner? Or, do I understand the concept incorrectly, and is a weak learner simply any benchmark that I choose (for example, a linear regression)?

860

asked Dec 06 '13 23:12

Stu

2 Answers

better than random guessing

That is basically the only requirement for a weak learner. So long as you can consistently beat random guessing, any true boosting algorithm will be able to increase the accuracy of the final ensemble. What weak learner you should choose is then a trade off between 3 factors:

The bias of the model. A lower bias is almost always better, but you don't want to pick something that will overfit (yes, boosting can and does overfit)
The training time for the weak learner. Generally we want to be able to learn a weak learner quickly, as we are going to be building a few hundred (or thousand) of them.
The prediction time for our weak learner. If we use a model that has a slow prediction rate, our ensemble of them is going to be a few hundred times slower!

The classic weak learner is a decision tree. By changing the maximum depth of the tree, you can control all 3 factors. This makes them incredibly popular for boosting. What you should be using depends on your individual problem, but decision trees is a good starting point.

NOTE: So long as the algorithm supports weighted data instances, any algorithm can be used for boosting. A guest speaker at my University was boosting 5 layer deep neural networks for his work in computational biology.

141

answered Oct 07 '22 19:10

Raff.Edward

Weak learners are basically thresholds for each feature. One simple example is a 1-level decision tree called decision stump applied in bagging or boosting. It just chooses a threshold for one feature and splits the data on that threshold (for example, to determine whether the iris flower is Iris versicolor or Iris virginica based on the petal width). Then it is trained on this specific feature by bagging or AdaBoost.

answered Oct 07 '22 18:10

lennon310

Related questions
                            
                                SVM - Difference between Energy vs Loss vs Regularization vs Cost function
                            
                                Keras RNN loss does not decrease over epoch
                            
                                Difference between LinearRegression() and Ridge(alpha=0)
                            
                                Image resizing method during preprocessing for neural network
                            
                                GridSearch with Keras Neural Networks
                            
                                Gradient calculation in Hamming loss for multi-label classification
                            
                                Dimension mismatch error in Spark ML
                            
                                How to save the encoded output in Keras
                            
                                tf.cond lowers the training speed
                            
                                How to convert Euclidean distance to range 0 and 1 like Cosine Similarity?
                            
                                Is it possible to get the objective function value during each training step?
                            
                                Binary Crossentropy to penalize all components of one-hot vector
                            
                                Is it possible to certify an AI-based solution for safety-critical systems? [closed]
                            
                                Least Squares method in practice
                            
                                Deep Learning an Imbalanced data set
                            
                                How to add a regression head after the fully connected layer in convolutional network using Tensorflow?
                            
                                Does CrossValidator in PySpark distribute the execution?
                            
                                Machine learning - normalizing features with no theoretical maximum value
                            
                                Using keras tokenizer for new words not in training set
                            
                                Why is binary_crossentropy more accurate than categorical_crossentropy for multiclass classification in Keras?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a weak learner?

Tags:

machine-learning

ensemble-learning

Stu

People also ask

2 Answers

Raff.Edward

lennon310

Recent Activity

Donate For Us