for i in range(100):
preds = model(inputs)
loss = mse(preds, targets)
loss.backward()
with torch.no_grad():
w -= w.grad * 1e-5
b -= b.grad * 1e-5
w.grad.zero_()
b.grad.zero_()

In this loop, should not we must calculate loss after adjusting w and b .I am getting doubt that each time we adjust w grad zero how are we calculating new grad as we are not taking new random w(weight).

w -= w.grad * 1e-5 & b -= b.grad * 1e-5 adjust your weights

w.grad.zero_() & b.grad.zero_() reset the gradients to zero because otherwise, when the for loop goes into its next iteration, it would add the newly calculated gradient to the existing gradient values.

As you can see from the above, we are:

getting predictions, then

calculating loss, then

calculating gradients, then

then adjusting our weights, then

resetting gradients for our next calculation.

The reason you do it in that order instead of re-calculating losses after adjusting the weights and biases is your predictions have not changed, and so your loss has not changed. Does this clear up your confusion?

Naah , we’ll have to calculate loss first to find out the d(loss) w.r.t w,b.
After calculating loss the gradients are then multiplied to the learning rate
(w -= w.grad * 1e-5
b -= b.grad * 1e-5)
and which gives the new weight and bias.
in order to find these gradients we"ll have to get the loss.
And after getting the loss and reassigning the weights and bias , then after every batch or epoch we will predict the loss with these new bias and weights.

If you are thinking about initializing weights and bias, they are set to normal random values to avoid stucking of gradient descent(in our case) at local minima