Bias and Variance
Bias and Variance two words are basic concept of ML. If you don’t know about them then malai baal.
Model Types
1. Overfit Model
When your model (say machine learning method/ algorithm or say curve) perform very well (imagine curve covering all the training points) in training dataset then there is high chance of model overfiting.
Note
Predicted value: Points of green curve
Actual value: Data points in red color
Let’s understand above figure.
There, (High) Variance is introduced. Variance can simply be understood as difference in fit of the model (how close the predicted value is from expected value between fits in 2 different test dataset) in different test dataset (or in outside ML context it is how much the predicted value is scattered).
If you have an model ( ML model/equation/ML algorithm … all similar), and you try to fit it in 2 different dataset. If there is high difference in error between there test error then it is called high variance. Here, test error varies greatly based on the selection of the training dataset.
2. Underfit Model
When your model (say machine learning method/ algorithm or say curve) perform little well only (imagine curve not covering must training points) then there is chance of model underfiting.
Let’s understand above figure.
There, (High) Bias is introduced. Bias can simply understood as how far the predicted value (curve) is from expected value(red dots).
If you have an model ( ML model/equation/ML algorithm … all similar), and you try to fit it in 2 different dataset. If there is high error in train error then it is called high bias. Here, train error is still high though different train dataset is taken.
3. Balanced Model
When your model (say machine learning method/ algorithm or say curve) perform well (imagine curve covering good number training points with less distance gap between curve and points) then there is chance of model being balanced. When it happens then in new test dataset there is possibility of less error (good prediction).
Low variance and low bias can result in creating a good model.
Remembering Tips 💡
a. Bias
How much the model fails to capture a true pattern in a training dataset. Resulting an underfit model (consistently wrong prediction in new dataset).
b. Variance
It is the amount by which the prediction would change if we fit the model to a different training data set (Bad prediction in new dataset).
Overfitting is shown. Variance is sensitivity to training data.
Note
Blue Dots: Training points
Green Dots: Testing points
References
-
Codebasics. Machine Learning Tutorial Python – 20: Bias vs Variance In Machine Learning. YouTube. Watch here
-
Josh Starmer (StatQuest). Machine Learning Fundamentals: Bias and Variance. YouTube. Watch here