Papernot and McDaniel introduce deep k-nearest neighbors where nearest neighbors are found at each intermediate layer in order to improve interpretbaility and robustness. Personally, I really appreciated reading this paper; thus, I will not only discuss the actually proposed method but also highlight some ideas from their thorough survey and experimental results.
First, Papernot and McDaniel provide a quite thorough survey of relevant work in three disciplines: confidence, interpretability and robustness. To the best of my knowledge, this is one of few papers that explicitly make the connection of these three disciplines. Especially the work on confidence is interesting in the light of robustness as Papernot and McDaniel also frequently distinguish between in-distribution and out-distribution samples. Here, it is commonly known that deep neural networks are over-confidence when moving away from the data distribution.
The deep k-nearest neighbor approach is described in Algorithm 1 and summarized in the following. For a trained model and a training set of labeled samples, they first find k nearest neighbors for each intermediate layer of the network. The layer nonconformity with a specific label $j$, referred to as $\alpha$ in Algorithm 1, is computed as the number of labels that in the set of nearest neighbors that do not share this label. By comparing these nonconformity values to a set of reference values (computing over a set of labeled calibration data), the prediction can be refined. In particular, the probability for label $j$ can be computed as the fraction of reference nonconformity values that are higher than the computed one. See Algorthm 1 or the paper for details.
Finally, they provide experimental results – again considering the three disciplines of confidence/credibility, interpretability and robustness. The main take-aways are that the resulting confidences are more reliable on out-of-distribution samples, which also include adversarial examples. Additioanlly, the nearest neighbor allow very basic interpretation of the predictions.