interactive(children=(Dropdown(description='model1', options=(LogisticRegression(C=100), DecisionTreeClassifie…
interactive(children=(IntSlider(value=3, description='depth', max=5, min=1), Output()), _dom_classes=('widget-…
max_features features, randomly selectedinteractive(children=(Dropdown(description='model', options=(RandomForestClassifier(n_estimators=5, n_jobs=-1,…
Different implementations can be used. E.g. in scikit-learn:
BaggingClassifier: Choose your own base model and sampling procedureRandomForestClassifier: Default implementation, many optionsExtraTreesClassifier: Uses extremely randomized treesMost important parameters:
n_estimators (>100, higher is better, but diminishing returns)max_featuresmax_depth, min_samples_split,...Easy to parallelize (set n_jobs to -1)
random_state (bootstrap samples) for reproducibility oob_error = 1 - clf.oob_score_interactive(children=(IntSlider(value=30, description='iteration', max=60), Output()), _dom_classes=('widget-i…
For iteration $m=1..M$:
Early stopping (optional): stop when performance on validation set does not improve for $nr$ iterations
interactive(children=(IntSlider(value=30, description='step', max=60), Output()), _dom_classes=('widget-intera…
interactive(children=(IntSlider(value=30, description='iteration', max=60, min=1), Output()), _dom_classes=('w…
Main hyperparameters:
n_estimators: Higher is better, but will start to overfitlearning_rate: Lower rates mean more trees are needed to get more complex modelsn_estimators as high as possible, then tune learning_ratelearning_rate and use early stopping to avoid overfittingmax_depth: typically kept low (<5), reduce when overfittingmax_features: can also be tuned, similar to random forestsn_iter_no_change: early stopping: algorithm stops if improvement is less than a certain tolerance tol for more than n_iter_no_change iterations.sketch_eps, default 0.03)HistGradientBoostingClassifier is similarxgboost python package is sklearn-compatibleconda install -c conda-forge xgboostAnother fast boosting technique
Another fast boosting technique

| Name | Representation | Loss function | Optimization | Regularization |
|---|---|---|---|---|
| Classification trees | Decision tree | Entropy / Gini index | Hunt's algorithm | Tree depth,... |
| Regression trees | Decision tree | Square loss | Hunt's algorithm | Tree depth,... |
| RandomForest | Ensemble of randomized trees | Entropy / Gini / Square | (Bagging) | Number/depth of trees,... |
| AdaBoost | Ensemble of stumps | Exponential loss | Greedy search | Number/depth of trees,... |
| GradientBoostingRegression | Ensemble of regression trees | Square loss | Gradient descent | Number/depth of trees,... |
| GradientBoostingClassification | Ensemble of regression trees | Log loss | Gradient descent | Number/depth of trees,... |
| XGBoost, LightGBM, CatBoost | Ensemble of XGBoost trees | Square/log loss | 2nd order gradients | Number/depth of trees,... |
| Stacking | Ensemble of heterogeneous models | / | / | Number of models,... |