What is the difference between loss function and cost function? Are the two not same?

  • What is loss function? What is its purpose?
  • What is cost function? What does it tell?

Loss function is a method of evaluating how well your algorithm models your data set. If your predictions are totally off, your loss function will output a higher number. If they’re pretty good, it’ll output a lower number. As you tune your algorithm to try and improve your model, your loss function will tell you if you’re improving or not. ‘Loss’ helps us to understand how much the predicted value differ from actual value.

The loss error is computed for a single training example. If we have ‘m’ number of examples then the average of the loss function of the entire training set is called Cost function.

Cost function (J) = (Sum of Loss error for ‘m’ examples) / m

Cost function is a function that measures the performance of a machine learning model for given data. It quantifies the error between predicted values and expected values and presents it in the form of a single real number. Depending on the problem Cost Function can be formed in many different ways.

The purpose of Cost Function is to be either:

  • Minimized — then returned value is usually called cost, loss or error. The goal is to find the values of model parameters for which Cost Function return as small number as possible.
  • Maximized — then the value it yields is named a reward. The goal is to find values of model parameters for which returned number is as large as possible.

In other words, the terms cost and loss functions almost refer to the same meaning. But, the loss function mainly applies for a single training set as compared to the cost function which deals with a penalty for a number of training sets or the complete batch. It is also sometimes called an error function. In short, we can say that the loss function is a part of the cost function. The cost function is calculated as an average of loss functions. The loss function is a value that is calculated at every instance. So, for a single training cycle loss is calculated numerous times, but the cost function is only calculated once.


Let us look at the following two examples:

Example 1:
One of the loss function used in Linear Regression, the square loss

One of the cost function used in Linear Regression, the Mean Squared Error


Example 2:
One of the loss function used in SVM, the hinge loss

SVM cost function


(there are additional constraints connecting and with training set)