In previous video we say the model with a single layer using cross entropy is linear and in this video we created 2 layers and because the 2nd layer uses ReLU which is non linear function, model becomes non linear.
But just wondering, isn’t cross entropy also non linear? It’s a logarithmic function.
Determine (because it depends) where the cross entropy is used.
If it’s applied to output (= used before getting predictions) of the model, then the whole model can be considered non-linear.
If it’s applied to targets and predictions from the model (as an error/cost function), but the model isn’t using any non-linear activation function, then the model is linear, but the error function isn’t
You are right it was used before generating the predictions.
I guess the difference is ReLU was used to generate the outputs for the 2nd layer so it was before generating the predictions while before it was just used to evaluate the function.
def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
Make sense now.
Thanks a lot.
I meant actually more the insides of the model, than any training step.
y' = f(x) → if
f is non-linear, then the transformation is considered non-linear as well.
z = g(y, y') → this is something totally separate from the above transform.
f is linear, and
g is non-linear, the transformation above is still considered linear - this is because your
y' (the predictions or however you call it) depend directly on
g (you can’t use
z and call it prediction, because it’s a totally different measure).
Notice that it’s usually hard to confuse final activation function from the loss/cost function → the latter one accepts two inputs, because it’s task is to measure how well the model have performed.
Thank you for the explanation I think I get it now.
In the previous lesson we had only 1 layer where we simply apply the linear equation and calculate the output so nothing non linear here. Cross entropy is just simply to calculate the loss.
While in this lesson we had 2 layers, first apply linear equation for layer 1 then relu activation function (which is non linear) and finally apply linear equation again for layer 2 to calculate the output.
So because relu is part of the calculation to get to the output (which I believe is what you refer to as y’ above), this is considered non linear.