Exponential family is a large class of probabilistic distributions, both discrete and continuous. Some of these distributions include Gaussian and Bernouli distributions. As the name suggested, distributions in this famility are in a generic exponential form.
Consider a random variable from a exponential family distribution. Its probability mass function (if is discrete) or probability density function (if it continuous) is written as
- is 's sufficient statistic(s);
- is the natural parameter(s) of the distribution;
- is the parameter(s) of the distribution;
- is the log-partition function, which act as a normalizer;
- is a function that depends on .
Some Distributions in Exponential Family
Bernoulli distribution has one paramerter, called . Its sample space , e.g. coin tossing. Its probablity mass function is usally written in the following form:
We can rewrite the above equation using the exponential-logarithm trick:
So, we can conclude the followings:
Let's turn to an exponential family distribution for continuous random variables. The most important one is the Gaussian distribution. For univariate settings, i.e. , the density is
Cumulant: Moment Generating Function
Let . The cumulant . In the following, we are going to show that we can get the moment parameter of Bernoulli and Gaussian distributions from .
Let recall that for Bernouli distributions. We have
After rearranging the equation, it yields
Taking the first and second derivative, we have
Therefore, we recover the mean and the variance of Bernoulli distributions.
Noting, you might notice that the function transformating to looks familiar; indeed, this is the sigmoid function! In generalized linear models, it is the link function.
Recall of the Gaussian distribution. Let and . Solving the equation, we have
We know that corresponds to , i.e. . If we compute the partial derivative and , we get
That means we discover 's true mean (first moment) and variance (second moment) of the guassian distribution by differentiating its cumulant .
While writing this article, I was relying on Prof. M. Opper & Théo's lecture slides for Probabilistic Bayesian Modelling course (Summer 2020) and Prof. M. Jordan's reading matertial for his Bayesian Modeling and Inference (2010).
The first figure was made with Google Colab.