Vanishing Gradient Problem

Written 2023-04-30

When training a neural network and the derivative of the error function is too small, it can effectively stop updating, or make extremely slow progress.
This causes training to finish slowly or not at all.
This is a problem sometimes seen with Sigmoid activation functions, among others, due to their saturating nature at the extremes.
Faster hardware has helped with this issue, and other types of functions such as ReLU help as well.