Hello! I have a question from the linear regression part. To be specific from the “fit” function.

Can anyone tell how exactly stochastic gradient descent works? What I understand from the video is instead of taking the full training data SGD takes a subset. I mean if there is suppose 1000 row in training data, probably in SGD it will take 50 rows in every iteration. If this is the case, then why we have to divide our data into batches using DataLoader?

We divide the data because it makes the process faster. However if we divide it with too high or too low batch size, the process consumes more time too. So a perfect batch size is chosen to make the process faster. SGD uses the local minima concept of calculus.

1 Like