Activation Maximization

June 02, 2021
Table of Content


Denote xX:=Rd\mathbf x \in \mathcal X := \reals^d a sample and consider a DNN f:XRKf: \mathcal X \rightarrow \reals^Kwhere KNK \in \mathbb N is the number of classes. The logit (value before applying softmax) of the kk-th class is

fk(x;θ):=[f(x;θ)]k,\begin{aligned} f_k(\mathbf x; \theta) := [f(\mathbf x; \theta )]_k, \end{aligned}

where we explicitly write θ\theta as the parameters of ff.

The goal of activation maximization for the kk-th class is to

xarg maxxXfk(x;θ).\begin{aligned} \mathbf x^* \leftarrow \argmax_{\mathbf x \in \mathcal X} f_k (\mathbf x ; \theta). \end{aligned}

Recall that fk(x;θ)Rf_k(\mathbf x; \theta) \in \reals, the optimization problem above is not well-defined because we can always find x\mathbf x that makes fk()f_k(\cdot) larger. To illustrate, we consider

flinear(x;θ)=xTw+b,\begin{aligned} f_\text{linear}(\mathbf x; \theta) = \mathbf x^T\mathbf w + b, \end{aligned}

where θ={wRd,bR}\theta = \{ \mathbf w \in \reals^d, b \in \reals \} . Here, we can clearly see that making x\mathbf x larger directly increases the value flinear(x;θ)f_\text{linear}(\mathbf x; \theta).

To make the objective problem well defined, we employ regularization Ω(x)\Omega(\mathbf x), which allows us to specify how suitable solution of the objective problem looks like. For example, one natural choice is the l2l_2 regularizer

Ωl2(x)=λx22,\begin{aligned} \Omega_{l_2}(\mathbf x) = - \lambda \| \mathbf x \|^2_2, \end{aligned}

which prefers the solution with smallest l2l_2 norm. The activation maximization (in literature also known as feature visualization) becomes

xarg maxxXfk(x;θ)+Ω(x)=:L(x;Ωλ).\begin{aligned} \mathbf x^* \leftarrow \argmax_{\mathbf x \in \mathcal X} \underbrace{f_k (\mathbf x ; \theta) + \Omega(\mathbf x)}_{=:\mathcal L (\mathbf x; \Omega_\lambda )}. \end{aligned}

For the case of X=R\mathcal X = \reals and Ω=Ωl2\Omega = \Omega_{l_2}, the objective function becomes concave due to the convexity of Ωl2,λ=λx2\Omega_{l_2, \lambda} = -\lambda x^2.

Fig. 1: Components the Loss for Example 1.

Because of concavity, we now have a close-form solution for flinearf_\text{linear}:

xL(x;Ωl2,λ)=w2λx    x=w2λ.\begin{aligned} \nabla_\mathbf x \mathcal L(\mathbf x; \Omega_{l_2, \lambda}) = \mathbf w - 2\lambda \mathbf x \implies \mathbf x^* = \frac{\mathbf w}{2\lambda}. \end{aligned}

Example 2:

Let XR2\mathcal X \in \reals^2 and denote x=(x1,x2)\mathbf x = (x_1, x_2). Consider f(x)=max(x1,x2)f(\mathbf x ) = \max(x_1, x_2) and Ωl2,λ\Omega_{l_2,\lambda}. The objective function is

L(x;Ωl2,λ)=max(x1,x2)λx22.\begin{aligned} \mathcal L(\mathbf x ; \Omega_{l_2, \lambda}) = \max(x_1, x_2) - \lambda \| \mathbf x \|_2^2. \end{aligned}

One observes that λx22\lambda \| \mathbf x \|_2^2 is the circle (or hyper-spherical in higher dimensions) constraint. Let's assume λ=1\lambda = 1. We observe that in this case, we have two possible solutions, which is where the level curve of the regularizer touches the level curve at f(x)=3f(\mathbf x ) = 3.

Fig. 2: Components of the Loss for Example 2; dashed lines are level curves of the regularizer.

In practice, from my experience, it is quite difficult to get visually understandable samples from the process, and it seems that a wide of regularization that one can employ. For the image domain, Olah et al. (distill.pub, 2017) provides an overview on this regularization spectrum.

Probabilistic Interpretation

Denote ωk\omega_k be the index of the kk-th class for k={1,,K}.k = \{1, \dots, K\}. Instead of taking fk(x)f_k(\mathbf x) being the logit value, we could take it to be

fk(x):=logP(ωkx).\begin{aligned} f_k(\mathbf x) := \log \mathbb{P}(\omega_k | \mathbf x). \end{aligned}

Let Ω(x)=logP(x)\Omega(\mathbf x) = \log \mathbb{P} (\mathbf x). Recall Bayes' rule

P(ωkx)P(x)=P(xωk)P(ωk).\begin{aligned} \mathbb P (\omega_k | \mathbf x) \mathbb P (\mathbf x) = \mathbb P (\mathbf x | \omega_k) \mathbb P(\omega_k). \end{aligned}

Hence, in this setting, the objective of activation can be rewritten as

L(x)=fk(x)+Ω(x)=logP(ωkx)+logP(x)=logP(xωk)+logP(ωk).\begin{aligned} \mathcal L(\mathbf x) &= f_k(\mathbf x) + \Omega (\mathbf x) \\ &= \log \mathbb P (\omega_k | \mathbf x) + \log \mathbb P(\mathbf x) \\ &= \log \mathbb P (\mathbf x | \omega_k) + \log \mathbb P(\omega_k). \end{aligned}

We can see here that the marginal distribution of the class does not depend on x\mathbf x, hence no influence on the solution of maxxL(x)\max_{\mathbf x } \mathcal L(\mathbf x). Therefore, we can view activation maximization to find a prototypical sample for the given class ωk\omega_k, while maximizing only fk(x)f_k(\mathbf x) is to find the sample that the model is the most certain for the class ωk\omega_k

Implicit Density Models Perspective

Finer interpretation on activation maximization can be through the view of implicit generative models learned by discriminate models. In particular, Srinivas and Fleuret (ICLR, 2021) proposes to consider the joint distribution between ωk\omega_k and x\mathbf x

P(ωk,x):=exp(fk(x;θ))Z(θ),\begin{aligned} \mathbb{P}(\omega_k , \mathbf x) := \frac{\exp(f_k(\mathbf x; \theta))}{Z(\theta)}, \end{aligned}

where Z(θ)Z(\theta) is the normalization constant. In the following, we will also write fk(x):=fk(x;θ)f_k(\mathbf x) := f_k(\mathbf x; \theta) to reduce notation cluttering. First, we observe that

P(x)=1Z(θ)kexp(fk(x)).\begin{aligned} \mathbb{P}(\mathbf x) = \frac{1}{Z(\theta)} \sum_{k'} \exp(f_{k'}(\mathbf x)). \end{aligned}

Secondly, we know that

P(ωkx)=1P(x)P(ωk,x)=[Z(θ)kexp(fk(x))][exp(fk(x))Z(θ)]=:Softmax(f(x)).\begin{aligned} \mathbb{P}(\omega_k | \mathbf x) &= \frac{1}{\mathbb{P}(\mathbf x)} \mathbb{P}(\omega_k, \mathbf x)\\ &= \bigg[ \frac{\cancel{Z(\theta)}}{\sum_{k'} \exp(f_{k'}(\mathbf x))} \bigg] \bigg[ \frac{\exp(f_k(\mathbf x) )}{\cancel{Z(\theta)}} \bigg ] \\ &=: \text{Softmax}(f(\mathbf x)). \end{aligned}

Consider the conditional distribution of the sample given ωk\omega_k

P(xωk)=P(ωk,x)P(ωk).\begin{aligned} \mathbb{P}(\mathbf x |\omega_k ) = \frac{\mathbb{P}(\omega_k, \mathbf x)}{\mathbb{P}(\omega_k)}. \end{aligned}

Taking the logarithm yields

logP(xωk)=fk(x)logZ(θ)logP(ωk).\begin{aligned} \log \mathbb P(\mathbf x | \omega_k) = f_k(\mathbf x) - \log Z(\theta) - \log \mathbb{P}(\omega_k). \end{aligned}

Because the second and third terms do not depend on x\mathbf x, maximizing fk(x)f_k(\mathbf x) is thus equivalent to maximizing logP(xωk)\log \mathbb{P}(\mathbf x |\omega_k).

Connection to Adversarial Robustness

It has been observed that preforming activation maximization on adversarially robust models produce images that are more visually plausible that standard models. Some of recent works on this direction include (in chronological order)

This phenomena is an interesting connection between adversarial robustness and model interpretability.

Srinivas and Fleuret (ICLR, 2021) study this exact question via the view of implicit density models that has just mentioned. More precisely, one of their key results is that when making the implicit density of DNNs more aligned (via score matching [Hyvärinen (JMLR, 2005)]) improves the structure of gradient-based explanations.

Figure Taken From Srinivas and Fleuret (2021).


Activation maximization is a tool that one can use to study what features DNNs learn. Recent works have observed interesting properties from synthetic images from the framework, and the connection between these properties and adversarial robustness seem prominent. However, despite such positive results, a recent human study [Borowski and Zimmermann et al. (ICLR, 2021)] shows that these synthetic images might not be that helpful for humans to understand models comparing those exemplar images.

This article is my recollection of Grégoire Montavon's ML 1 (WS2021), Lecture XAI, at TU Berlin.

The first two figures are made in Google Colab.