Background : Taylor Expansion
Consider a function and Let's say we would to approximate around a local maximum value of . We start by finding the optimal value using calculus, i.e. ; note also that at we have . We then do the Taylor expansion to the 2nd order around :
Let's try to approximate . For this function, and . From derivation, we have
Fig 1: Approximating a function with Taylor expansion.
In brief, Laplace (sometimes also called Gaussian) approximation applies the Taylor expansion on to compute
such that . In other words, and is a likelihood.
Let's assume that we only care about the behaviour of around the mode . Define . For the previous section, we know that
Thus, we get
In other word, .
Application 1: Approximating Posterior Distributions
Consider a dataset and . We want to estimate estimate the posterior
Using the identity , we can write
where and . For the sake of completeness, we repeat the same derivation again but for the dimensional case. We perform the Taylor explanation on at the mode , yielding
where the first order is zero by construction and . Therefore, the posterior is
which is in the form of a Gaussian distribution. If we set
Fig. 2: Two posterior approximations using Laplace approximation.
Proton Counter Problem (Mackay (2013), Ex. 27.1)
Define the number of protons measured and we want to infer their arrival rate . We assume that follows a Poisson distribution, hence
We further assume that we have an improper prior, Define
Taking derivate yields
We also have , thus
Therefore, the approximated posterior is
where and .
Application 2: Stirling’s formula
Consider the Euler Gamma function
. We can see that and ; in other words, this function is the factorial function
Although we know the formula for the factorial, our goal is to approximate the function with a closed form without explicitly computing the factorial. We start by defining a new variable
thus , , and . Using integration by substitution yields
where . Computing the first and second derivatives, we get
- ; setting it to zero yields the maximum value .
- ; therefore, the second derivative at is .
Therefore, using the Laplace approximation, we arrive at
With this approximation, we can compute
Fig. 3: Relative error of t! approximation
One obvious limitation of the Laplace approximation is inherit from the Taylor expansion that it captures only the local behaviour of the function near the root point: in the case of approximating the posterior distribution, this root point is the mode of the distribution.
I consulted the following materials while writing this article:
Figures in this article are made using Google Colab.