Background: Linear Regression
Consider that we have a dataset . We assume that
where is the true function and . Then, we can construct the likelihood
Because . The expected (empirical) log likelihood is the squared loss:
Taking derivative w.r.t yields
Let and . The derivative is
Setting the derivative to zero yields
For ridge regression, which has an additional term (called regularizer), . The solution is
Nonlinear Regression with Kernel : Kernel Ridge Regression
Instead of using , we can consider transforming into some feature space first. More precisely, we would like to find . We then assume a model
Hence, the solution can be founded using the regularized least squared equation:
However, finding such is not trivial. To overcome this, we instead find implicity through a kernel function . Under technical conditions on , it is known that
Consider a test sample . The prediction is
Define and . From the representer theorem, we also know that the solution is the span of . More precisely, it is in the following form:
Therefore, in other words, we have
Fig. 1: Prediction from linear regression, kernel ridge regression, and Gaussian processes.
Gaussian Processes: Bayesian Kernel Regression
From Bayes' rule, we know that
In general, is intractable. However, if we choose appropriately, the term can be computed analytically. In particular, in the case of regression (i.e. Gaussian likelihood), we can choose . Because Gaussian distributions are close under multiplication, the posterior distribution is also a Gaussian distribution. In this case, we have
where and .
Let's consider using the feature map , we can rewrite as
For linear regression with Gaussian prior (i.e. ridge regression), the predictive mean and variance are
Therefore, we have
Similarly, using the Sherman-Morrison-Woodbury formula, we have
In summary, we have
Fig. 2: linear regression and Gaussian processes trained with datasets with different sizes.
These are materials I consulted while writing this article:
Figures are maded using Google Colab.