# DAVIDSTUTZ

11thAUGUST2018

$R(W) = \|W^TW – I\|$
where $I$ is the identity matrix. During training, this regularizer is supposed to ensure that the learned weigth matrices are orthonormal – an efficient alternative to regular matrix manifold optimization techniques (see the paper).