Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use weight when training a weak learner for adaboost

The following is adaboost algorithm: enter image description here

It mentions "using weights wi on the training data" at part 3.1.

I am not very clear about how to use the weights. Should I resample the training data?

like image 961
tidy Avatar asked Aug 05 '13 08:08

tidy


People also ask

How are weights used in AdaBoost?

AdaBoost assigns weight to each training example to determine its significance in the training dataset. When the assigned weights are high, that set of training data points are likely to have a larger say in the training set.

How does AdaBoost select and weight weak classifiers?

After each classifier is trained, the classifier's weight is calculated based on its accuracy. More accurate classifiers are given more weight. A classifier with 50% accuracy is given a weight of zero, and a classifier with less than 50% accuracy (kind of a funny concept) is given negative weight.

How do we combine the results of individual weak learners in AdaBoost?

The outputs of individual weak learners are combined as a weighted sum and represent the final output of the boosted classifier. AdaBoost stands for “Adaptive Boosting”.

Does AdaBoost takes a weighted voting weighted average of the weak learners for the final prediction?

Making Predictions with AdaBoostPredictions are made by calculating the weighted average of the weak classifiers. For a new input instance, each weak learner calculates a predicted value as either +1.0 or -1.0. The predicted values are weighted by each weak learners stage value.


1 Answers

I am not very clear about how to use the weights. Should I resample the training data?

It depends on what classifier you are using.

If your classifier can take instance weight (weighted training examples) into account, then you don't need to resample the data. An example classifier could be naive bayes classifier that accumulates weighted counts or a weighted k-nearest-neighbor classifier.

Otherwise, you want to resample the data using the instance weight, i.e., those instance with more weights could be sampled multiple times; while those instance with little weight might not even appear in the training data. Most of the other classifiers fall in this category.

In Practice

Actually in practice, boosting performs better if you only rely on a pool of very naive classifiers, e.g., decision stump, linear discriminant. In this case, the algorithm you listed has a easy-to-implement form (see here for details): enter image description here Where alpha is chosen by (epsilon is defined similarly as yours).

enter image description here

An Example

Define a two-class problem in the plane (for example, a circle of points inside a square) and build a strong classier out of a pool of randomly generated linear discriminants of the type sign(ax1 + bx2 + c).

The two class labels are represented with red crosses and blue dots. We here are using a bunch of linear discriminants (yellow lines) to construct the pool of naive/weak classifiers. We generate 1000 data points for each class in the graph (inside the circle or not) and 20% of data is reserved for testing.

enter image description here

This is the classification result (in the test dataset) I got, in which I used 50 linear discriminants. The training error is 1.45% and the testing error is 2.3%

enter image description here

like image 168
greeness Avatar answered Oct 05 '22 05:10

greeness