Seong Joon Oh, Max Augustin, Bernt Schiele, Mario Fritz. Whitening Black-Box Neural Networks. CoRR abs/1711.01768, 2017.

Oh et al. propose two different approaches for whitening black box neural networks, i.e. predicting details of their internals such as architecture or training procedure. In particular, they consider attributes regarding architecture (activation function, dropout, max pooling, kernel size of convolutional layers, number of convolutionaly/fully connected layers etc.), attributes concerning optimization (batch size and optimization algorithm) and attributes regarding the data (data split and size). In order to create a dataset of models, they trained roughly 11k models on MNIST; they ensured that these models have at least 98% accuracy on the validation set and they also consider ensembles.

For predicting model attributes, they propose two models, called kennen-o and kennen-i, see Figure 1. kennen-o takes as input a set of $100$ predictions of the models (i.e. final probability distributions) and tries to directly learn the attributes using a MLP of two fully connected layers. Kennen-i instead crafts a single input which allows to reason about a specific model attribute. An example for kennen-i is shown in Figure 2. In experiments, they demonstrate that both models are able to predict model attributes significantly better than chance. For details, I refer to the paper.

Figure 1: Illustration of the two proposed approaches, kennen-o (top) and kennen-i (bottom).

Figure 2: Illustration of the images created by kennen-i to classify different attributes. See the paper for details.

Also find this summary on ShortScience.org.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.