- What is a epoch?
- What is batch and batch size?
- What is an iteration?
- Why do we use more than one epoch?

**Sample:** It is a single row of data. It contains inputs that are fed into the algorithm and an output that is used to compare to the prediction and calculate an error. A training dataset is comprised of many rows of data, e.g. many samples. A sample may also be called an instance, an observation, an input vector, or a feature vector.

**Batch and Batch Size:** The batch size is a hyperparameter that defines the number of samples to work through before updating the internal model parameters. Think of a batch as a for-loop iterating over one or more samples and making predictions. At the end of the batch, the predictions are compared to the expected output variables and an error is calculated. From this error, the update algorithm is used to improve the model. A training dataset can be divided into one or more batches.

When all training samples are used to create one batch, the learning algorithm is called batch gradient descent. When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent. When the batch size is more than one sample and less than the size of the training dataset, the learning algorithm is called mini-batch gradient descent.

**Epoch:** One Epoch is when an entire dataset is passed forward and backward through the neural network only once. Since one epoch is too big to feed to the computer at once we divide it in several smaller batches. The number of epochs is traditionally large, often hundreds or thousands, allowing the learning algorithm to run until the error from the model has been sufficiently minimized. The number of epochs is traditionally large, often hundreds or thousands, allowing the learning algorithm to run until the error from the model has been sufficiently minimized.

**Difference between batch and epoch**

- The batch size is a number of samples processed before the model is updated.
- The number of epochs is the number of complete passes through the training dataset.
- The number of epochs can be set to an integer value between one and infinity. You can run the algorithm for as long as you like and even stop it using other criteria besides a fixed number of epochs, such as a change (or lack of change) in model error over time.
- Batch Size and Epoch are both integer values and they are both hyperparameters for the learning algorithm.

**Iterations:** It is the number of batches needed to complete one epoch.

**Example:** Let’s say a dataset has 30000 training samples that is going to be used. One can divide the dataset of 30000 samples into batches of 1000 (batch size), then it will take 30 iterations to complete 1 epoch.

**Why should one use more than one epoch?**

It can be noted that passing the entire dataset through a neural network is not enough and one needs to pass the full dataset multiple times to the same neural network. But keep in mind that limited dataset is being used and to optimise the learning and the graph, Gradient Descent is being used, which is an iterative process. So, updating the weights with single pass or one epoch is not enough. One epoch leads to underfitting of the curve in the graph. As the number of epochs increases, the number of times the weights are changed in the neural network also increases and the curve goes from **underfitting** to **optimal** to **overfitting** curve.