David Madras, James Atwood, Alex D'Amour. Detecting Extrapolation with Influence Functions. ICML Workshop, 2019.

Madras et al. use influence functions to detect novel/out-of-distribution examples. Influence functions measure the impact of a specific training example on a model’s parameters. Thus, influence functions can be computed as

$\mathcal{I}(z’) = - H_{\theta^*}^{-1} \nabla_{\theta^*} l(z’, \theta^*)$

where $\theta^*$ are the model parameters (trained without $z’$), $l$ is the corresponding training loss (here, $z = (x,y)$ includes both label and input example) and $H$ is the corresponding Hessian. This formulation can be thought of as being a Newton step in order to reduce the loss $l(z’, \theta^*)$. In practice, the Hessian $H$ needs to regularized in order to be positive semi-definite. Subsequently, the authors define three different metrics based on this formulation of influence functions in order to assess novelty. In essence, these metrics all take the form

$\nabla_{\theta^*} l(z’, \theta^*)^T H_{\theta^*}^{-k} \nabla_{\theta^*} l(z’, \theta^*)$

In the paper, this expression is linked to the eigen decomposition of the Hessian. The intuition is that predictions that are not supported by the training data correspond to gradient directions aligned with eigenvectors of small eigenvalues. In experiments, the metrics are tested in a simple synthetic case, as well as on MNIST.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: