Consider a dataset . We assume that is a corrupted measurement of with some noise , i.e.
The goal of regression is to find a model that can capture the relationship between and . Let be reasonable parameters of this model. Using Maximum Likelihood Estimation (MLE), this objective can be written as
where is the likelihood that we get such an observation from under a model with parameters .
Because we assume that is normally distributed, the objective above is
We cancel the two terms in the derivation because they do not depends on , hence no influence on the opitimization.
Maximum a Posteriori
The MLE approach above consider only the likelihood term in Bayes' rule:
Because does not depend on , the posterior is
Using Maximum a Posteriori (MAP), one can find suitable parameters by maximizing :
Consider each parameter . The second term is then
L_2 Regularizer (Weight Decay)
Borrowing the intermediate derivation step of the likelihood term from the previous section and substituting the prior term into the optimization yield
For the last step, we multiply , which is positive, to the equation. From the result above, is a hyperparameter that governs how much we would like to regularize the model, and one can interpret it as the ratio between the variances of data and parameters.
The content of this article is largely summarised from