Member-only story

Understanding Stochastic Gradient Descent in Machine Learning

4 min readDec 9, 2021

In a former article, we talked about the Gradient Descent algorithm. In this article, we will now consider the Stochastic Gradient Descent algorithm. We will try to understand it both intuitively and mathematically, and also why it might be preferred instead of Gradient Descent.

Re-visiting Gradient Descent

We remember that Gradient Descent is when we want to find the minimum of a function. We imagine ourselves to be a hiker and we want to hike down to the bottom of a valley. Mathematically it could look like so:

Where we have that:

Future Step: You are on your way downhill, and you are calculating the position of the next step. This future step is denoted by:
Current Step: While you are calculating the position of your next step, you are already standing at your current position. This current step is denoted bt:
Step Size: This is simply the size of the step that we take. This concept will be discussed at a later time.
Direction: Then we arrive at the last part, which is simply the direction of the steepest descent. This is denoted by:

So this formula simply tells us the next position we need to go, which is in the direction of the steepest descent from where we are currently standing.

Understanding Stochastic Gradient Descent in Machine Learning

Re-visiting Gradient Descent

Understanding Stochastic Gradient Descent

Written by Helene

No responses yet