Calculating Bias and Variance in Supervised Learning Models: a Step-by-step Approach

Bias and variance are important concepts in supervised learning that help evaluate the performance of models. Understanding how to calculate these metrics can improve model selection and tuning.

Understanding Bias and Variance

Bias refers to the error introduced by approximating a real-world problem with a simplified model. Variance measures how much the model’s predictions fluctuate for different training datasets. Both influence the model’s accuracy and generalization ability.

Calculating Bias

Bias is calculated by comparing the average prediction of the model to the true value. The steps include:

Train the model multiple times on different training datasets.
Predict the output for a fixed test point each time.
Calculate the average of these predictions.
Compute the difference between this average and the true value.
Square this difference to obtain the bias squared.

Calculating Variance

Variance measures the variability of the model’s predictions. To compute it:

Use the predictions from multiple models trained on different datasets.
Calculate the mean prediction across all models.
Determine the squared deviation of each prediction from this mean.
Average these squared deviations to find the variance.

Practical Example

Suppose you have a dataset and train a model five times on different subsets. For a specific test point, the predictions are 3.2, 3.8, 3.5, 3.7, and 3.3. The true value is 4.0.

The average prediction is 3.5. The bias squared is (4.0 – 3.5)^2 = 0.25. The variance is calculated by averaging the squared deviations of each prediction from 3.5, which results in 0.14.

Table of Contents

Understanding Bias and Variance

Calculating Bias

Calculating Variance

Practical Example

Related Posts