IAM

OPENSOURCEFAN STUDYING
STUDYINGCOMPUTERSCIENCEANDMATH COMPUTERSCIENCE

Check out the latest superpixel benchmark — Superpixel Benchmark (2016) — and let me know your opinion! @david_stutz
21thAPRIL2017

READING

D. Mishkin, J. Matas. All you need is a good init. CoRR, 2015.

Mishkin and Matas extend the orthonormal initialization scheme introduced in [] by additionally normalizing the initialized weights by the variance as measured on the first batch. The full procedure is summarized in Algorithm 1.

function LSUV_initialization(
        $\tau$ // variance tolerance
    )
    pre-initialize weights with orthonormal matrices as in []
    for $l = 1,\ldots,L$ // for each layer
        // $x^{(l)}$ denotes the output tensor of layer $l$
        while $|\text{Var}[x^{(l)}] - 1| \geq \tau$ 
            forward pass to compute $x^{(l)}$
            calculate $\text{Var}[x^{(l)}]$
            // $W^{(l)}$ denotes the weights of layer $l$
            $W^{(l)} = \frac{W^{(l)}}{\sqrt{\text{Var}[x^{(l)}]}}$

Algorithm 1: LSUV - Layer-sequential unit-variance initialization.

The corresponding implementation is easy to understand and can be found here.

  • [] A. M. Saxe, J. L. McClelland, S. Ganguli .Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. CoRR, 2013.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or using the following platforms: