Exponential family is a large class of probabilistic distributions, both discrete and continuous. Some of these distributions include Gaussian and Bernouli distributions. As the name suggested, distributions in this famility are in a generic exponential form.

Consider $X$ a random variable from a exponential family distribution. Its probability mass function (if $X$ is discrete) or probability density function (if it continuous) is written as

where

- $\phi(x)$ is $X$'s sufficient statistic(s);
- $\eta(\theta)$ is the natural parameter(s) of the distribution;
- $\theta$ is the parameter(s) of the distribution;

- $g(\theta)$ is the log-partition function, which act as a normalizer;
- $f(x)$ is a function that depends on $x$.

## Some Distributions in Exponential Family

### Bernoulli Distribution

Bernoulli distribution has one paramerter, called $p \in [0, 1]$. Its sample space $\Omega = \{0, 1\}$, e.g. coin tossing. Its probablity mass function is usally written in the following form:

We can rewrite the above equation using the exponential-logarithm trick:

So, we can conclude the followings:

- $f(x) = 1$;
- $\phi(x) = x$;
- $\eta(p) = \log \bigg( \frac{p}{1-p} \bigg)$;
- $g(p) = \log (1-p)$.

### Gaussian Distribution

Let's turn to an exponential family distribution for continuous random variables. The most important one is the Gaussian distribution. For univariate settings, i.e. $x \in \Reals$, the density is

where

- $f(x) = \frac{1}{\sqrt{2\pi}}$;
- $\phi(x) = (x, x^2)^T$
- $\eta(\bf \theta) = (\frac{\mu}{\sigma^2}, -\frac{1}{\sigma^2} )^T$;
- $g(\bf \theta) = - \frac{\mu^2}{2\sigma^2} - \log \sigma$.

## Cumulant: Moment Generating Function

Let $\eta = \eta(\theta)$. The cumulant $A(\eta) \equiv -g(\theta)$. In the following, we are going to show that we can get the moment parameter of Bernoulli and Gaussian distributions from $A(\eta)$.

### Bernoulli Distribution

Let recall that $g(\theta) = \log(1-p)$ for Bernouli distributions. We have

After rearranging the equation, it yields

Taking the first and second derivative, we have

Therefore, we recover the mean $p$ and the variance $p(1-p)$ of Bernoulli distributions.

Noting, you might notice that the function transformating $\eta$ to $p$ looks familiar; indeed, this is the sigmoid function! In generalized linear models, it is the link function.

### Gaussian Distribution

Recall $g(\theta)$ of the Gaussian distribution. Let $\mathbf (\eta_1, \eta_2)^T \equiv \eta(\mathbf \theta)$ and $A(\mathbf{\eta_1, \eta_2}) = -g(\theta)$. Solving the equation, we have

We know that $\eta_1$ corresponds to $\phi(x)_1$, i.e. $x$. If we compute the partial derivative $\frac{\partial}{\partial \eta_1} A(\eta)$ and $\frac{\partial^2}{\partial^2 \eta_1} A(\eta)$, we get

That means we discover $X$'s true mean (first moment) and variance (second moment) of the guassian distribution by differentiating its cumulant $A(\cdot)$.

## References

While writing this article, I was relying on Prof. M. Opper & Théo's lecture slides for Probabilistic Bayesian Modelling course (Summer 2020) and Prof. M. Jordan's reading matertial for his Bayesian Modeling and Inference (2010).

The first figure was made with Google Colab.