Bias, Variance tradeoff (underfitting, overfitting)
Understanding this concept begins in defining what is a sample/training data set. All statistical/machine learning models require a set of well documented/labelled data set and a hypothesis function to train and build a machine learning system. The sample data set is drawn from a population, which is unknown, but an assumption can be made of the nature of the original distribution. This sample is a small representative of a large population. A major part of Machine learning process is then defining and fine tuning the hypothesis function and estimating the parameters of the distribution from which this sample was drawn. Linear Regression, as an example is a machine learning model whose typical hypothesis function is a linear mapping from input variables to output variables or labels. One could include higher degree polynomials to better fit the training data. The choice of hypothesis function defines the model capacity - without elaborating further, will point out the choice of model and defining the hypothesis function is critical for how well a machine learning system performs.
Machine learning is then to identify from this sample what that original distribution is, that generated the data - from a few pictures of a cat can we build a system that recognizes cats in any image. What are parameters? they define the character of the distribution. For example, if we assume a Bernoulli distribution - then the parameter that defines this distribution is the mean. Similarly, for a Gaussian distribution it would be the mean and variance. The process of ML algorithms is to estimate these parameters. Note, these are called as estimates, we do not know the actual mean, we have not seen the whole population from where this sample is drawn to be certain that it is the actual mean of the population.
Given the estimate, how do we assume it is accurate or close to the actual? Here we stand on the shoulders of Statistical learning theory, where it is shown that Bias and Variance are two evaluators for expressing the error between the estimated and the actual values of the parameters. Furthermore, MLE (Maximum likelihood estimator) the most common technique for estimating parameters have been shown to produce accurate estimates. These have been verified for many of the distributions that are commonly used in practice (all the work has already been done!!). The above background is to explain the origin and why Bias and variance are important and how they are relevant to the concept of "underfitting" and "overfitting". Evaluator Bias, captures the concept of how well the hypothesis function fits the training data - a large Bias (relates to large training error) implies the hypothesis is not capturing or fitting the training data well. Evaluator Variance, capture the concept of how well the hypothesis function performs on the test data - here a large variance, or large test error, implies the hypothesis function is not able to generalize well. There are certain levers one can use to reduce bias error and variance error. Summon a large data sample, or make the hypothesis function more expressive (like using higher polynomials in the case of Linear Regression).
The process is iterative!!