## Formulation

Denote $\mathbf x \in \mathcal X := \reals^d$ a sample and consider a DNN $f: \mathcal X \rightarrow \reals^K$where $K \in \mathbb N$ is the number of classes. The logit (value before applying softmax) of the $k$-th class is

where we explicitly write $\theta$ as the parameters of $f$.

The goal of activation maximization for the $k$-th class is to

Recall that $f_k(\mathbf x; \theta) \in \reals$, the optimization problem above is not well-defined because we can always find $\mathbf x$ that makes $f_k(\cdot)$ larger. To illustrate, we consider

where $\theta = \{ \mathbf w \in \reals^d, b \in \reals \}$ . Here, we can clearly see that making $\mathbf x$ larger directly increases the value $f_\text{linear}(\mathbf x; \theta)$.

To make the objective problem well defined, we employ regularization $\Omega(\mathbf x)$, which allows us to specify how suitable solution of the objective problem looks like. For example, one natural choice is the $l_2$ regularizer

which prefers the solution with smallest $l_2$ norm. The activation maximization (in literature also known as feature visualization) becomes

For the case of $\mathcal X = \reals$ and $\Omega = \Omega_{l_2}$, the objective function becomes concave due to the convexity of $\Omega_{l_2, \lambda} = -\lambda x^2$.

Because of concavity, we now have a close-form solution for $f_\text{linear}$:

**Example 2:**

Let $\mathcal X \in \reals^2$ and denote $\mathbf x = (x_1, x_2)$. Consider $f(\mathbf x ) = \max(x_1, x_2)$ and $\Omega_{l_2,\lambda}$. The objective function is

One observes that $\lambda \| \mathbf x \|_2^2$ is the circle (or hyper-spherical in higher dimensions) constraint. Let's assume $\lambda = 1$. We observe that in this case, we have two possible solutions, which is where the level curve of the regularizer touches the level curve at $f(\mathbf x ) = 3$.

In practice, from my experience, it is quite difficult to get visually understandable samples from the process, and it seems that a wide of regularization that one can employ. For the image domain, Olah et al. (distill.pub, 2017) provides an overview on this regularization spectrum.

## Probabilistic Interpretation

Denote $\omega_k$ be the index of the $k$-th class for $k = \{1, \dots, K\}.$ Instead of taking $f_k(\mathbf x)$ being the logit value, we could take it to be

Let $\Omega(\mathbf x) = \log \mathbb{P} (\mathbf x)$. Recall Bayes' rule

Hence, in this setting, the objective of activation can be rewritten as

We can see here that the marginal distribution of the class does not depend on $\mathbf x$, hence no influence on the solution of $\max_{\mathbf x } \mathcal L(\mathbf x)$. Therefore, we can view activation maximization to find a prototypical sample for the given class $\omega_k$, while maximizing only $f_k(\mathbf x)$ is to find the sample that the model is the most certain for the class $\omega_k$

### Implicit Density Models Perspective

Finer interpretation on activation maximization can be through the view of implicit generative models learned by discriminate models. In particular, Srinivas and Fleuret (ICLR, 2021) proposes to consider the joint distribution between $\omega_k$ and $\mathbf x$

where $Z(\theta)$ is the normalization constant. In the following, we will also write $f_k(\mathbf x) := f_k(\mathbf x; \theta)$ to reduce notation cluttering. First, we observe that

Secondly, we know that

Consider the conditional distribution of the sample given $\omega_k$

Taking the logarithm yields

Because the second and third terms do not depend on $\mathbf x$, maximizing $f_k(\mathbf x)$ is thus equivalent to maximizing $\log \mathbb{P}(\mathbf x |\omega_k)$.

## Connection to Adversarial Robustness

It has been observed that preforming activation maximization on adversarially robust models produce images that are more visually plausible that standard models. Some of recent works on this direction include (in chronological order)

- Ross and Doshi-Velez (AAAI, 2018), "Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing Their Input Gradients"
- Etmann and Lunz et al. (ICML, 2019), "On the Connection Between Adversarial Robustness and Saliency Map Interpretability"
- Wang et al. (Openreview, 2019) Smooth Kernels Improve Adversarial Robustness and Perceptually-Aligned Gradients
- Boopathy et al. (ICML, 2020), "Proper Network Interpretability Helps Adversarial Robustness in Classification"
- Mangla et al. (ECML/PKDD, 2020), "On Saliency Maps and Adversarial Robustness"

This phenomena is an interesting connection between adversarial robustness and model interpretability.

Srinivas and Fleuret (ICLR, 2021) study this exact question via the view of implicit density models that has just mentioned. More precisely, one of their key results is that when making the implicit density of DNNs more aligned (via score matching [Hyvärinen (JMLR, 2005)]) improves the structure of gradient-based explanations.

## Conclusions

Activation maximization is a tool that one can use to study what features DNNs learn. Recent works have observed interesting properties from synthetic images from the framework, and the connection between these properties and adversarial robustness seem prominent. However, despite such positive results, a recent human study [Borowski and Zimmermann et al. (ICLR, 2021)] shows that these synthetic images might not be that helpful for humans to understand models comparing those exemplar images.

This article is my recollection of Grégoire Montavon's ML 1 (WS2021), Lecture XAI, at TU Berlin.

The first two figures are made in Google Colab.