IAM

OPENSOURCEFAN STUDYING
STUDYINGCOMPUTERSCIENCEANDMATH COMPUTERSCIENCE

DAVIDSTUTZ

Check out the latest superpixel benchmark — Superpixel Benchmark (2016) — and let me know your opinion! @david_stutz
21thAPRIL2017

READING

A. M. Saxe, J. L. McClelland, S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. CoRR, 2013.

Saxe et al. give a mathematically concise discussion of deep linear networks in order to evaluate the advantage of pre-training for initialization. While I highly recommend the read for all machine learning practitioners interested in deep learning, the involved mathematics exceeds the intended scope of my reading notes — therefore, I only give the main conclusions as also used in the literature (e.g. in [] to refine the proposed initialization scheme).

  • Pre-training using auto-encoders will improve convergence and yield better performance if the input-output correlation resembles the input-input correlations. Without mathematical argument, this might be interpreted as the auto-encoder loss yielding useful kernels/weights for the actual supervised task.
  • Instead of using random, Gaussian initialization of weight matrices, Saxe et al. recommend to initialize the weight matrices by random orthogonal matrices. This can also be extended to convolutional neural networks as for example discussed in Hendrik Weideman's blog.
  • [] D. Mishkin, J. Matas. All you need is a good init. CoRR, 2015.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or using the following platforms: