Provable Algorithms for training ReLU networks

We propose and analyze algorithms for training ReLU networks with skipped connections. Skipped connections are the key feature of residual networks (or ResNets) which have been shown to provide superior performance in deep learning applications. We analyze two approaches for training such networks, gradient descent and alternating minimization, and compare convergence criteria of both methods. We show that under typical (Gaussianity) assumptions on the ddimensional input data, both gradient descent and alternating minimization provably converge in a linearly convergent fashion, assuming any good enough initialization; moreover, we show that a simple "identity" initialization suffices. Furthermore, we provide statistical upper bounds which indicate that $n = O(d^3)$ suffice to achieve this convergence rate. To our knowledge, these constitute the first global parameter recovery guarantees for shallow ResNet-type networks with ReLU activations.

Publications
  1. G. Jagatap and C. Hegde, “Linearly Convergent Algorithms for Learning Shallow Residual Networks”, January 2018. [Paper / Code.]

People

Chinmay Hegde
Assistant Professor, ECPE