Approaches for Whitening Data

February 09, 2021

Consider a dataset D={xiRd}i=1n\mathcal{D} = \{ \mathbf x_i \in \reals^d\}_{i=1}^n. Denote XRd×n\mathbf X \in \reals^{d \times n} be the data matrix. Without loss of generality, we assume that the data has zero mean; thus the covariance matrix is Σ=1nXXT\Sigma = \frac{1}{n} \mathbf X \mathbf X^T.

Define ψ()\psi(\cdot) be the whitening operator parameterized by WW, which is commonly referred to whitening matrix. Let X^:=ψ(X)\hat \mathbf X := \psi(\mathbf X). The goal of whitening is to decorate the data; that is

1nX^X^T=Σ^=I.\frac{1}{n} \hat \mathbf{X} \hat \mathbf{X}^T = \hat \Sigma = I.

Recall that Σ\Sigma is symmetric and positive definite; it thus can be decomposed into

Σ=UΛUT,\Sigma = U \Lambda U^T,

where UU's columns are Σ\Sigma's eigenvectors and Λ\Lambda is a diagonal matrix containing real eigenvalues σi2 i[1,d]\sigma^2_i\ \forall i \in [1,d].

Because UU is a orthonormal matrix, i.e. uiTuj=0i,j[1,m]\mathbf u_i ^T \mathbf u_j = 0 \forall i,j \in [1, m] and iji\ne j. Consider ui\mathbf u_i. The above derivation shows that

(uiTX)(uiTX)T=j=1n(uiTxj)2=σi2.\begin{aligned} ( \mathbf u_i ^T \mathbf X) (\mathbf u_i ^T \mathbf X)^T &= \sum_{j=1}^n (\mathbf u_i^T \mathbf x_j)^2 \\ &= \sigma_i^2. \end{aligned}

In other word, we would like to find WW that satisfies WTW=Σ1W^T W= \Sigma^{-1}. We can see this condition yields the diagonal coraviance condition:

X^X^T=(WX)T(WX)=XTWTWX=XTWTWX=XTΣ1X=XTUΛ1UTX=nI\begin{aligned} \hat{\mathbf X}\hat{\mathbf X}^T &= (W\mathbf X)^T (W\mathbf X) \\ &= \mathbf X^T W^T W\mathbf X \\ &= \mathbf X^T W^T W\mathbf X \\ &= \mathbf X^T \Sigma^{-1} \mathbf X \\ &= \mathbf X^T U \Lambda^{-1} U^T\mathbf X \\ &= nI \end{aligned}

Approach 1: PCA-Whitening

With this fact, the natural choice of the whitening operator is

ψPCA(X)=Λ1/2UT:=WPCAX.\psi_{\text{PCA}}(\mathbf X) = \underbrace{\Lambda^{-1/2} U^T}_{:=W_\text{PCA}} \mathbf X.

Thus, we can see that

WPCATWPCA=(Λ1/2UT)TΛ1/2UT=UΛ1/2Λ1/2UT=UΛ1UT=Σ1.\begin{aligned} W_{\text{PCA}}^TW_{\text{PCA}} &= (\Lambda^{-1/2} U^T)^T \Lambda^{-1/2} U^T \\ &= U \Lambda^{-1/2}\Lambda^{-1/2} U^T \\ &= U \Lambda^{-1}U^T \\ &= \Sigma^{-1}. \end{aligned}

Approach 2: ZCA-Whitening

However, whitening is not unique because whitened data remains whitened when transform. We can impose an additional constraint on ψ\psi. In particular, we can rotate the PCA-whitened data to be close to the original data. This is called Zero-Phase Component Analysis (ZCA):

ψZCA(X)=UΛ1/2UTX=UΛ1/2UTWZCAX.\begin{aligned} \psi_\text{ZCA}(\mathbf X) &= {U \Lambda^{-1/2}U^T}\mathbf X \\ &= \underbrace{U \Lambda^{-1/2}U^T}_{W_{\text{ZCA}}} \mathbf X. \end{aligned}

In fact, one can see that WZCA=Σ1/2W_\text{ZCA} = \Sigma^{-1/2}.

Approach 3: Cholesky Decomposition

For a positive definite matrix Σ\Sigma, we know that we can decompose it into

Σ=LLT,\Sigma = L L^T ,

where LL is a lower triangular matrix. Inverting the equation above gives us

Σ1=(LLT)1=(LT)1L1=(1)(L1)TL1,\begin{aligned} \Sigma^{-1} &= (LL^T)^{-1} \\ &= (L^T)^{-1} L^{-1} \\ &\overset{(1)}{=} (L^{-1})^{T} L^{-1}, \end{aligned}

where (1) uses the fact that matrix inverse and transpose are exchangeable. Here, it is obvious that we can take

Wchol=L1.W_\text{chol} = L^{-1}.

Because L1L^{-1} is also a lower triangular matrix. This invert can be computed efficiently using forward substitution.

Figure 1: Data whitened with various approaches. Although the results look similar, ZCA-whitened data is closed to the original.


These are resources that I consult while writing this blog: