Skip to main content

Optimiser

As explained in Train a neural network, the formula for updating the weights of a neural network is: wnew=woldαderrordww*{new} = w*{old} - \alpha \frac{d\text{error}}{dw} α\alpha is called the learning rate and is how quickly we move through the weight space to minimise the loss. α\alpha changes during the training process and is picked by an algorithm called the optimiser.

There are multiple optimiser algorithm, including

SGD

α\alpha decreases with a steady rate and tends to zero as the number of steps increases, for example α=αstart1+steps\alpha = \frac{\alpha_{\text{start}}}{1 + \text{steps}}

Adam

α\alpha depends on the strength of the last update using the following equation: equation adam wt+1w_{t+1} is the new weight, η\eta is the base learning rate and the β\beta are fixed hyper-parameters choosen at the start of the training process based on trial and error and results from other papers.

The values: β1 =0.9, β2 =0.999, ϵ =108\beta_1 = 0.9, \beta_2 = 0.999, \epsilon = 10^{−8} are known to work well.