Hi all, please bear with me if my questiuons sound trivial… I’ve only just ventured into this field so my foundation is ppretty weak…

looking at the pytorch codes for any network (linear regression, CNN, etc), I don’t understand how is it possible to differentiate a number (the gradient)… it’s actually partial differentiation isn’t it? but it’s still just a num,ber… for example: d/dx(xy)=y based on partial differentiation, but what is the meaning of d/dx(12.22)? it’s a number, how do you differentiate it?

in neural networks, the loss function (BCE, cross entropy etc) is not passed into the optimizer (SGD or Adam etc)… so the backward() call on the loss, whatever value is obtained, how is it that the optimizer knows about it and is able to update the parameters in the model with a scaled down version based on the loss (loss*learning rate right? ) ? and what is the step that does the backward propagation? backwards? if so, why is it still necesssary to call optimizer step?

in GANs, why is the generator able to pproduce the fake image based on the loss obtained from the discriminator? the discriminator’s loss is just a tensor with requires_grad=true, so basically it’s just a number… why is it that the generator can use it to progressively generate more meaningful images instead of noise?

in classifier, to obtain the probability of the selected class, do I apply sigmoid after softmax(which gives the selected class) or is there a better way?
your enlightenment is very much apppreciated. merci beaucoup