• We show that AdaBoost fits an additive model in a base learner, optimizing a novel exponential loss function. This loss function is very similar to the (negative) binomial log-likelihood (Sections 10.2–10.4).
• The population minimizer of the exponential loss function is shown to be the log-odds of the class probabilities (Section 10.5).
• We describe loss functions for regression and classification that are more robust than squared error or exponential loss (Section 10.6).
• It is argued that decision trees are an ideal base learner for data mining applications of boosting (Sections 10.7 and 10.9).
• We develop a class of gradient boosted models (GBMs), for boosting trees with any loss function (Section 10.10).
• The importance of “slow learning” is emphasized, and implemented by shrinkage of each new term that enters the model (Section 10.12), as well as randomization (Section 10.12.2).
• Tools for interpretation of the fitted model are described (Section 10.13).