Gabriel Pereyra, George Tucker, Jan Chorowski, Lukasz Kaiser, Geoffrey E. Hinton. Regularizing Neural Networks by Penalizing Confident Output Distributions. ICLR (Workshop), 2017.

Pereyra et al. propose an entropy regularizer for penalizing over-confident predictions of deep neural networks. Specifically, given the predicted distribution $p_\theta(y_i|x)$ for labels $y_i$ and network parameters $\theta$, a regularizer

$-\beta \max(0, \Gamma – H(p_\theta(y|x))$

is added to the learning objective. Here, $H$ denotes the entropy and $\beta$, $\Gamma$ are hyper-parameters allowing to weight and limit the regularizers influence. In experiments, this regularizer showed slightly improved performance on MNIST and Cifar-10.

Also find this summary on ShortScience.org.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: