a.k.a. Adaptive Boosting

  • Boosting method (sequential ensemble)
  • pay more attention to the training instances that the predecessor predictor under-fitted
  • Weights of the misclassified predictions are increased in order to pay more emphasis on these predictions while making the next predictor.
  • cannot be parallelized

Algorithm

  • each data point instance weight w(i)= 1/m
    • m = total number of points
  • train 1st predictor
  • for predictor j, weighted error rate rj calculated on the training data
    • add up weights of missclassifications
  • predictor weight **αj
    • more accurate the predictor is, the higher its weight
    • random guesses weight = 0
  • update data point instance weights w(i)
    • misclassified instances are weighted more
    • normalize all weights
  • repeat process on the next predictor (until k predictors) with the weighted instances

Inference mode

  • compute the predictions of all the predictors and weighs them using the predictor weight αj
  • predicted class → majority weighted vote