a.k.a. Adaptive Boosting
- Boosting method (sequential ensemble)
- pay more attention to the training instances that the predecessor predictor under-fitted
- Weights of the misclassified predictions are increased in order to pay more emphasis on these predictions while making the next predictor.
- cannot be parallelized
Algorithm
- each data point instance weight w(i)= 1/m
- m = total number of points
- train 1st predictor
- for predictor j, weighted error rate rj calculated on the training data
- add up weights of missclassifications
- predictor weight **αj
- more accurate the predictor is, the higher its weight
- random guesses ⇒ weight = 0
- update data point instance weights w(i)
- misclassified instances are weighted more
- normalize all weights
- repeat process on the next predictor (until k predictors) with the weighted instances
Inference mode
- compute the predictions of all the predictors and weighs them using the predictor weight αj
- predicted class → majority weighted vote