in a deep neural network, doing the backpropagation involves multiplying several derivatives of activation functions together - in some cases, this product will grow exponentially
opposite of Vanishing Gradient Problem
more common with Recurrent Neural Network