Pereyra et al. propose an entropy regularizer for penalizing over-confident predictions of deep neural networks. Specifically, given the predicted distribution $p_\theta(y_i|x)$ for labels $y_i$ and network parameters $\theta$, a regularizer
$-\beta \max(0, \Gamma – H(p_\theta(y|x))$
is added to the learning objective. Here, $H$ denotes the entropy and $\beta$, $\Gamma$ are hyper-parameters allowing to weight and limit the regularizers influence. In experiments, this regularizer showed slightly improved performance on MNIST and Cifar-10.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: