The following is adaboost algorithm:
It mentions "using weights wi on the training data" at part 3.1.
I am not very clear about how to use the weights. Should I resample the training data?
AdaBoost assigns weight to each training example to determine its significance in the training dataset. When the assigned weights are high, that set of training data points are likely to have a larger say in the training set.
After each classifier is trained, the classifier's weight is calculated based on its accuracy. More accurate classifiers are given more weight. A classifier with 50% accuracy is given a weight of zero, and a classifier with less than 50% accuracy (kind of a funny concept) is given negative weight.
The outputs of individual weak learners are combined as a weighted sum and represent the final output of the boosted classifier. AdaBoost stands for “Adaptive Boosting”.
Making Predictions with AdaBoostPredictions are made by calculating the weighted average of the weak classifiers. For a new input instance, each weak learner calculates a predicted value as either +1.0 or -1.0. The predicted values are weighted by each weak learners stage value.
I am not very clear about how to use the weights. Should I resample the training data?
It depends on what classifier you are using.
If your classifier can take instance weight (weighted training examples) into account, then you don't need to resample the data. An example classifier could be naive bayes classifier that accumulates weighted counts or a weighted k-nearest-neighbor classifier.
Otherwise, you want to resample the data using the instance weight, i.e., those instance with more weights could be sampled multiple times; while those instance with little weight might not even appear in the training data. Most of the other classifiers fall in this category.
Actually in practice, boosting performs better if you only rely on a pool of very naive classifiers, e.g., decision stump, linear discriminant. In this case, the algorithm you listed has a easy-to-implement form (see here for details): Where alpha is chosen by (epsilon is defined similarly as yours).
Define a two-class problem in the plane (for example, a circle of points inside a square) and build a strong classier out of a pool of randomly generated linear discriminants of the type sign(ax1 + bx2 + c).
The two class labels are represented with red crosses and blue dots. We here are using a bunch of linear discriminants (yellow lines) to construct the pool of naive/weak classifiers. We generate 1000 data points for each class in the graph (inside the circle or not) and 20% of data is reserved for testing.
This is the classification result (in the test dataset) I got, in which I used 50 linear discriminants. The training error is 1.45% and the testing error is 2.3%
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With