Formulation
Denote a sample and consider a DNN where is the number of classes. The logit (value before applying softmax) of the -th class is
where we explicitly write as the parameters of .
The goal of activation maximization for the -th class is to
Recall that , the optimization problem above is not well-defined because we can always find that makes larger. To illustrate, we consider
where . Here, we can clearly see that making larger directly increases the value .
To make the objective problem well defined, we employ regularization , which allows us to specify how suitable solution of the objective problem looks like. For example, one natural choice is the regularizer
which prefers the solution with smallest norm. The activation maximization (in literature also known as feature visualization) becomes
For the case of and , the objective function becomes concave due to the convexity of .
Because of concavity, we now have a close-form solution for :
Example 2:
Let and denote . Consider and . The objective function is
One observes that is the circle (or hyper-spherical in higher dimensions) constraint. Let's assume . We observe that in this case, we have two possible solutions, which is where the level curve of the regularizer touches the level curve at .
In practice, from my experience, it is quite difficult to get visually understandable samples from the process, and it seems that a wide of regularization that one can employ. For the image domain, Olah et al. (distill.pub, 2017) provides an overview on this regularization spectrum.
Probabilistic Interpretation
Denote be the index of the -th class for Instead of taking being the logit value, we could take it to be
Let . Recall Bayes' rule
Hence, in this setting, the objective of activation can be rewritten as
We can see here that the marginal distribution of the class does not depend on , hence no influence on the solution of . Therefore, we can view activation maximization to find a prototypical sample for the given class , while maximizing only is to find the sample that the model is the most certain for the class
Implicit Density Models Perspective
Finer interpretation on activation maximization can be through the view of implicit generative models learned by discriminate models. In particular, Srinivas and Fleuret (ICLR, 2021) proposes to consider the joint distribution between and
where is the normalization constant. In the following, we will also write to reduce notation cluttering. First, we observe that
Secondly, we know that
Consider the conditional distribution of the sample given
Taking the logarithm yields
Because the second and third terms do not depend on , maximizing is thus equivalent to maximizing .
Connection to Adversarial Robustness
It has been observed that preforming activation maximization on adversarially robust models produce images that are more visually plausible that standard models. Some of recent works on this direction include (in chronological order)
- Ross and Doshi-Velez (AAAI, 2018), "Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing Their Input Gradients"
- Etmann and Lunz et al. (ICML, 2019), "On the Connection Between Adversarial Robustness and Saliency Map Interpretability"
- Wang et al. (Openreview, 2019) Smooth Kernels Improve Adversarial Robustness and Perceptually-Aligned Gradients
- Boopathy et al. (ICML, 2020), "Proper Network Interpretability Helps Adversarial Robustness in Classification"
- Mangla et al. (ECML/PKDD, 2020), "On Saliency Maps and Adversarial Robustness"
This phenomena is an interesting connection between adversarial robustness and model interpretability.
Srinivas and Fleuret (ICLR, 2021) study this exact question via the view of implicit density models that has just mentioned. More precisely, one of their key results is that when making the implicit density of DNNs more aligned (via score matching [Hyvärinen (JMLR, 2005)]) improves the structure of gradient-based explanations.
Conclusions
Activation maximization is a tool that one can use to study what features DNNs learn. Recent works have observed interesting properties from synthetic images from the framework, and the connection between these properties and adversarial robustness seem prominent. However, despite such positive results, a recent human study [Borowski and Zimmermann et al. (ICLR, 2021)] shows that these synthetic images might not be that helpful for humans to understand models comparing those exemplar images.
This article is my recollection of Grégoire Montavon's ML 1 (WS2021), Lecture XAI, at TU Berlin.
The first two figures are made in Google Colab.