Exponential family is a large class of probabilistic distributions, both discrete and continuous. Some of these distributions include Gaussian and Bernouli distributions. As the name suggested, distributions in this famility are in a generic exponential form.
Consider X a random variable from a exponential family distribution. Its probability mass function (if X is discrete) or probability density function (if it continuous) is written as
p(x∣θ)=f(x)exp(η(θ)⋅ϕ(x)+g(θ)),
where
ϕ(x) is X's sufficient statistic(s);
η(θ) is the natural parameter(s) of the distribution;
θ is the parameter(s) of the distribution;
g(θ) is the log-partition function, which act as a normalizer;
Bernoulli distribution has one paramerter, called p∈[0,1]. Its sample space Ω={0,1}, e.g. coin tossing. Its probablity mass function is usally written in the following form:
p(x∣p)=px(1−p)1−x.
We can rewrite the above equation using the exponential-logarithm trick:
Let's turn to an exponential family distribution for continuous random variables. The most important one is the Gaussian distribution. For univariate settings, i.e. x∈R, the density is
Let η=η(θ). The cumulant A(η)≡−g(θ). In the following, we are going to show that we can get the moment parameter of Bernoulli and Gaussian distributions from A(η).
Bernoulli Distribution
Let recall that g(θ)=log(1−p) for Bernouli distributions. We have
Therefore, we recover the mean p and the variance p(1−p) of Bernoulli distributions.
Noting, you might notice that the function transformating η to p looks familiar; indeed, this is the sigmoid function! In generalized linear models, it is the link function.
Gaussian Distribution
Recall g(θ) of the Gaussian distribution. Let (η1,η2)T≡η(θ) and A(η1,η2)=−g(θ). Solving the equation, we have
A(η1,η2)=4η2η12+21log(−2η2).
We know that η1 corresponds to ϕ(x)1, i.e. x. If we compute the partial derivative ∂η1∂A(η) and ∂2η1∂2A(η), we get