The loss function

Is it always true that loss function follows the decreasing trend over the no of epochs choosen ? Because in the lecture, it decreases to 16 and then again jumped back to a higher value 23?

The value returned by loss function might oscillate around local/global minima. Sometimes this means that it might grow back, only to decrease back after some epochs (perhaps because the network found a better minima).

You’re confusing the trend and the loss function plot. Overall, the result of loss function decreases, even if it spikes for a while.

The loss function might grow back though if the network starts to overfit. What’s usually of interest for the person training the network is it’s behavior when applied to unseen data (test set) and that’s what’s usually plotted in graphs. But the network is trained for a different data (train set). If the network becomes overly good with training data, then the loss function (applied to test set) grows up (simply: the network memorizes that 2+2=4, but forgets the idea about what addition is, so 2+3 gives wrong results).


Probably what’s happening is the learning rate is not set small enough. In this case, the loss will bounce around the minimum but never converge.
There’s a tradeoff when setting learning rate between a) training speed and b) predictive accuracy. Some optimizers try to update the learning rate during training so you can have the best of both worlds.


Glad to see your valuable and prompt reply.

Thanks for your prompt response to my question.