Consider a dataset . Denote be the data matrix. Without loss of generality, we assume that the data has zero mean; thus the covariance matrix is .
Define be the whitening operator parameterized by , which is commonly referred to whitening matrix. Let . The goal of whitening is to decorate the data; that is
Recall that is symmetric and positive definite; it thus can be decomposed into
where 's columns are 's eigenvectors and is a diagonal matrix containing real eigenvalues .
Because is a orthonormal matrix, i.e. and . Consider . The above derivation shows that
In other word, we would like to find that satisfies . We can see this condition yields the diagonal coraviance condition:
Approach 1: PCA-Whitening
With this fact, the natural choice of the whitening operator is
Thus, we can see that
Approach 2: ZCA-Whitening
However, whitening is not unique because whitened data remains whitened when transform. We can impose an additional constraint on . In particular, we can rotate the PCA-whitened data to be close to the original data. This is called Zero-Phase Component Analysis (ZCA):
In fact, one can see that .
Approach 3: Cholesky Decomposition
For a positive definite matrix , we know that we can decompose it into
where is a lower triangular matrix. Inverting the equation above gives us
where (1) uses the fact that matrix inverse and transpose are exchangeable. Here, it is obvious that we can take
Because is also a lower triangular matrix. This invert can be computed efficiently using forward substitution.
Figure 1: Data whitened with various approaches. Although the results look similar, ZCA-whitened data is closed to the original.
These are resources that I consult while writing this blog: