Jonathan Frankle, Michael Carbin. The Lottery Ticket Hypothesis: Training Pruned Neural Networks. CoRR abs/1803.03635 (2018).

Frankle and Carbin discover so-called winning tickets, subset of weights of a neural network that are sufficient to obtain state-of-the-art accuracy. The lottery hypothesis states that dense networks contain subnetworks – the winning tickets – that can reach the same accuracy when trained in isolation, from scratch. The key insight is that these subnetworks seem to have received optimal initialization. Then, given a complex trained network for, e.g., Cifar, weights are pruned based on their absolute value – i.e., weights with small absolute value are pruned first. The remaining network is trained from scratch using the original initialization and reaches competitive performance using less than 10% of the original weights. As soon as the subnetwork is re-initialized, these results cannot be reproduced though. This suggests that these subnetworks obtained some sort of “optimal” initialization for learning.

Also find this summary on ShortScience.org.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.