Consider a dataset D={xi∈Rd}i=1n. Denote X∈Rd×n be the data matrix. Without loss of generality, we assume that the data has zero mean; thus the covariance matrix is Σ=n1XXT.
Define ψ(⋅) be the whitening operator parameterized by W, which is commonly referred to whitening matrix. Let X^:=ψ(X). The goal of whitening is to decorate the data; that is
n1X^X^T=Σ^=I.
Recall that Σ is symmetric and positive definite; it thus can be decomposed into
Σ=UΛUT,
where U's columns are Σ's eigenvectors and Λ is a diagonal matrix containing real eigenvalues σi2∀i∈[1,d].
Because U is a orthonormal matrix, i.e. uiTuj=0∀i,j∈[1,m] and i=j. Consider ui. The above derivation shows that
(uiTX)(uiTX)T=j=1∑n(uiTxj)2=σi2.
In other word, we would like to find W that satisfies WTW=Σ−1. We can see this condition yields the diagonal coraviance condition:
However, whitening is not unique because whitened data remains whitened when transform. We can impose an additional constraint on ψ. In particular, we can rotate the PCA-whitened data to be close to the original data. This is called Zero-Phase Component Analysis (ZCA):
ψZCA(X)=UΛ−1/2UTX=WZCAUΛ−1/2UTX.
In fact, one can see that WZCA=Σ−1/2.
Approach 3: Cholesky Decomposition
For a positive definite matrix Σ, we know that we can decompose it into
Σ=LLT,
where L is a lower triangular matrix. Inverting the equation above gives us
Σ−1=(LLT)−1=(LT)−1L−1=(1)(L−1)TL−1,
where (1) uses the fact that matrix inverse and transpose are exchangeable. Here, it is obvious that we can take