Vanishing Gradient Problem
Written
- When training a neural network and the derivative of the error function is too small, it can effectively stop updating, or make extremely slow progress.
- This causes training to finish slowly or not at all.
- This is a problem sometimes seen with Sigmoid activation functions, among others, due to their saturating nature at the extremes.
- Faster hardware has helped with this issue, and other types of functions such as ReLU help as well.