Bagging (Bootstrap Aggregation)

A machine learning ensemble used to improve the accuracy and stability of algorithms in regression and statistical classification

Written by CFI Team

Read Time 3 minutes

Over 2 million + professionals use CFI to learn accounting, financial analysis, modeling and more. Unlock the essentials of corporate finance with our free resources and get an exclusive sneak peek at the first module of each course. Start Free

What is Bagging (Bootstrap Aggregation)?

Ensemble machine learning can be mainly categorized into bagging and boosting. The bagging technique is useful for both regression and statistical classification. Bagging is used with decision trees, where it significantly raises the stability of models in improving accuracy and reducing variance, which eliminates the challenge of overfitting.

Bagging Flowchart

Figure 1. Bagging (Bootstrap Aggregation) Flow. Source

Bagging in ensemble machine learning takes several weak models, aggregating the predictions to select the best prediction. The weak models specialize in distinct sections of the feature space, which enables bagging leverage predictions to come from every model to reach the utmost purpose.

Summary

Bagging and boosting are the two main methods of ensemble machine learning.
Bagging is an ensemble method that can be used in regression and classification.
It is also known as bootstrap aggregation, which forms the two classifications of bagging.

What is Bootstrapping?

Bagging is composed of two parts: aggregation and bootstrapping. Bootstrapping is a sampling method, where a sample is chosen out of a set, using the replacement method. The learning algorithm is then run on the samples selected.

The bootstrapping technique uses sampling with replacements to make the selection procedure completely random. When a sample is selected without replacement, the subsequent selections of variables are always dependent on the previous selections, making the criteria non-random.

What is Aggregation?

Model predictions undergo aggregation to combine them for the final prediction to consider all the possible outcomes. The aggregation can be done based on the total number of outcomes or the probability of predictions derived from the bootstrapping of every model in the procedure.

What is an Ensemble Method?

Both bagging and boosting form the most prominent ensemble techniques. An ensemble method is a machine learning platform that helps multiple models in training by using the same learning algorithm. The ensemble method is a participant of a bigger group of multi-classifiers.

Multi-classifiers are a group of multiple learners, running into thousands, with a common goal that can fuse and solve a common problem. Another category of multi-classifiers is hybrid methods. The hybrid methods use a set of learners, but they can use distinct learning methods, unlike the multi-classifiers.

Learning faces multiple challenges, such as errors that are mainly due to bias, noise, and variance. The accuracy and stability of machine learning are guaranteed by ensemble methods such as bagging and boosting. Multiple classifiers combinations reduce variance, especially where classifiers are unstable, and they are important in presenting more reliable results than a single classifier.

The application of either bagging or boosting requires the selection of a base learner algorithm first. For example, if one chooses a classification tree, then boosting and bagging would be a pool of trees with a size equal to the user’s preference.

Advantages and Disadvantages of Bagging

Random forest is one of the most popular bagging algorithms. Bagging offers the advantage of allowing many weak learners to combine efforts to outdo a single strong learner. It also helps in the reduction of variance, hence eliminating the overfitting of models in the procedure.

One disadvantage of bagging is that it introduces a loss of interpretability of a model. The resultant model can experience lots of bias when the proper procedure is ignored. Despite bagging being highly accurate, it can be computationally expensive, which may discourage its use in certain instances.

Bagging vs. Boosting

The best technique to use between bagging and boosting depends on the data available, simulation, and any existing circumstances at the time. An estimate’s variance is significantly reduced by bagging and boosting techniques during the combination procedure, thereby increasing the accuracy. Therefore, the results obtained demonstrate higher stability than the individual results.

When an event presents the challenge of low performance, the bagging technique will not result in a better bias. However, the boosting technique generates a unified model with lower errors since it concentrates on optimizing the advantages and reducing shortcomings in a single model.

When the challenge in a single model is overfitting, the bagging method performs better than the boosting technique. Boosting faces the challenge of handling over-fitting since it comes with over-fitting in itself.