Vanishing Gradient Problem

Written
  • When training a neural network and the derivative of the error function is too small, it can effectively stop updating, or make extremely slow progress.
  • This causes training to finish slowly or not at all.
  • This is a problem sometimes seen with Sigmoid activation functions, among others, due to their saturating nature at the extremes.
  • Faster hardware has helped with this issue, and other types of functions such as ReLU help as well.

Thanks for reading! If you have any questions or comments, please send me a note on Twitter. And if you enjoyed this, I also have a newsletter where I write about tech thoughts, interesting things I've read, and project updates each Thursday.

You can check out a recent issue, or enter your email below to subscribe.