Subutai Ahmad, Luiz Scheinkman. How Can We Be So Dense? The Benefits of Using Highly Sparse Representations. CoRR abs/1903.11257 (2019).

Ahmad and Scheinkman propose a simple sparse layer in order to improve robustness against random noise. Specifically, considering a general linear network layer, i.e.

$\hat{y}^l = W^l y^{l-1} + b^l$ and $y^l = f(\hat{y}^l$

where $f$ is an activation function, the weights are first initialized using a sparse distribution; then, the activation function (commonly ReLU) is replaced by a top-$k$ ReLU version where only the top-$k$ activations are propagated. In experiments, this is shown to improve robustness against random noise on MNIST.

Also find this summary on ShortScience.org.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.