optimization_algorithms/gradient

What will you learn in this section?

Explore a simple minimization problem and solve it using Gradient Descent.

A Simple Minimization Problem

Let's examine a mathematical function and determine its minimum point.

Consider the quadratic function:
\begin{align} y = (z - 2)^2 \end{align}

How can we determine the minimum point of this function?

A Simple Solution
We can differentiate the function with respect to \( z \) and set the derivative to zero. This will give us the point where the function reaches its minimum:

\begin{align} \frac{\partial y}{\partial z} &= 0 \\ 2 (z - 2) &= 0 \\ z &= 2 \end{align}

Thus, \( z = 2 \) is the minimum point.

This approach works well for simple functions. However, when dealing with complex objective functions that have multiple parameters, it becomes difficult to apply this method.

An example of such a function is the squared error cost function:

\begin{align} J(a_1, b) = \frac{1}{2N} \sum_{i=1}^{N} (y_i - a_1 x_i - b)^2 \end{align}

This function contains two parameters: \( a_1 \) and \( b \). Finding their derivatives and setting them to zero is not feasible.

Is there an alternative method that works for both simple and complex functions? YES! Gradient Descent is the solution.

Gradient Descent

Gradient Descent is an optimization algorithm used to find the minimum point of any convex function. It works iteratively, moving step-by-step toward the minimum. The algorithm follows four major steps:

Define the objective function and identify its parameters.
For example, our objective function is \( y = (z - 2)^2 \), and we aim to minimize it with respect to \( z \).
Initialize parameter values.
Let's start with an initial value of \( z = 0 \). Any starting value can be chosen.
Compute the gradient of the function with respect to each parameter.
\begin{align} \frac{\partial y}{\partial z} = 2 (z - 2) \end{align}
At \( z = 0 \), gradient = -4
At \( z = 1 \), gradient = -2
We can compute the gradient for any given \( z \).
Update parameter values.
This step involves using the computed gradients to update each parameter: \begin{align} z = z - \alpha \times \frac{\partial y}{\partial z} \end{align}
Here, \( \alpha \) is the learning rate, which controls the step size. It is a user-defined parameter. Let’s assume \( \alpha = 0.1 \) and the initial value \( z = 0 \).
Running a few iterations of Gradient Descent:
Iteration 1:
\begin{align} z &= 0 - 0.1 \times (-4) \\ z &= 0.4 \end{align} After Iteration 1, \( z = 0.4 \).

Iteration 2:
\begin{align} z &= 0.4 - 0.1 \times (-3.2) \\ z &= 0.72 \end{align} After Iteration 2, \( z = 0.72 \).
Repeat Steps 3 & 4 until convergence.
The process continues until a stopping criterion is met. Common stopping criteria include:
- Running the algorithm for a fixed number of iterations.
- Stopping when the change in the cost function is insignificant (e.g., \( \text{new cost} - \text{old cost} < 10^{-6} \)).

Summary of this Lesson

We explored a convex cost function and applied Gradient Descent to find its minimum point.