Bias vs Variance is one of the most important concepts in Machine Learning interviews because it explains why models fail and how to improve them.
Bias (Underfitting problem)
Definition:
Bias is the error due to overly simplistic assumptions in the learning algorithm.
In simple terms:
The model is too simple to capture the patterns in data.
Example:
-
Using a linear regression model for a highly complex, non-linear dataset.
-
The model ignores important patterns.
Behavior:
-
High training error
-
High test error
-
Poor performance on both
Interview phrase:
“High bias leads to underfitting.”
Variance (Overfitting problem)
Definition:
Variance is the error due to the model being too sensitive to training data fluctuations.
In simple terms:
The model learns the training data too well, including noise.
Example:
-
A very deep decision tree that memorizes training data.
-
Performs well on training data but fails on unseen data.
Behavior:
-
Very low training error
-
High test error
Interview phrase:
“High variance leads to overfitting.”
Bias vs Variance Trade-off
The goal is to find a balance between simplicity and complexity.
| Aspect | High Bias (Underfitting) | High Variance (Overfitting) |
|---|---|---|
| Model | Too simple | Too complex |
| Training error | High | Low |
| Test error | High | High |
| Problem | Misses patterns | Learns noise |
Intuition (Very important for interviews)
Think of a target board :
-
High bias: All shots are far from the center (consistent but wrong)
-
High variance: Shots are scattered all over (inconsistent)
-
Good model: Shots are close to the center and close to each other
How to fix them (common interview follow-up)
To reduce Bias:
-
Use more complex models
-
Add more features
-
Reduce regularization
-
Train longer (for some models)
To reduce Variance:
-
Get more training data
-
Use regularization (L1/L2)
-
Reduce model complexity
-
Use techniques like cross-validation
-
Use ensemble methods (bagging, Random Forest)
One-line interview answer
“Bias is error due to overly simple assumptions causing underfitting, while variance is error due to sensitivity to training data causing overfitting. The challenge is to balance both to achieve good generalization.”
