IAM

17thOCTOBER2019

READING

Subutai Ahmad, Luiz Scheinkman. How Can We Be So Dense? The Benefits of Using Highly Sparse Representations. CoRR abs/1903.11257 (2019).

Ahmad and Scheinkman propose a simple sparse layer in order to improve robustness against random noise. Specifically, considering a general linear network layer, i.e.

$\hat{y}^l = W^l y^{l-1} + b^l$ and $y^l = f(\hat{y}^l$

where $f$ is an activation function, the weights are first initialized using a sparse distribution; then, the activation function (commonly ReLU) is replaced by a top-$k$ ReLU version where only the top-$k$ activations are propagated. In experiments, this is shown to improve robustness against random noise on MNIST.

Also find this summary on ShortScience.org.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: