What You Will Learn in This Section
- Logistic Regression Model
- Cost Function Used in Logistic Regression
- Gradient Descent Optimizer for Logistic Regression
In this section, we will systematically explore the key components of the Logistic Regression algorithm and then integrate them to understand the complete model. The major components of the Logistic Regression Model are:
- Model
- Cost Function
- Optimizer
1. Model
The model defines the relationship between the dependent variable \(y\) and independent variables \(x\). Each algorithm has its own model equation. Logistic Regression directly models the probability of data points belonging to the positive class. It computes these probabilities in two steps:
- First, it calculates the logit using Equation (1).
-
Next, it computes the probability by passing the logit values through the sigmoid function.
\begin{align} \hat{y} =\frac{1}{1+e^{-(\theta_0+\theta_1*x_1....\theta_k*x_k)}}....(3) \end{align}
\( \hat{y} \) is the predicted probability score, which represents the likelihood that a given data point belongs to class 1. A threshold is used to classify each data point as positive or negative. For example, if the probability score is greater than or equal to 0.5, the predicted class is 1; otherwise, it is 0.
2. Cost Function
The cost function measures the difference between actual and predicted values of the dependent variable. The model equation provides the predicted values \( (\hat{y}) \), while the dataset contains the actual values. Logistic Regression uses the binary cross-entropy loss function, given by:
\begin{align}
J(\theta_0,\theta_1,...\theta_k)=\frac{1}{m} * - \{ \sum_{i=1}^m y_i*\log(\hat{y_i}) + \sum_{i=1}^m(1-y_i)*\log(1-\hat{y_i}) \}
\end{align}
This function computes the average cost over the entire dataset, where \( m \) represents the number of data points.
The core idea behind the cost function is that if the predicted and actual values are close, the cost should be low; otherwise, it should be high. Below are some example cases demonstrating this concept:
- Actual class = 1, Predicted Probability = 0.9 \begin{align} \text{error} &= - \{ 1 * \log(0.9) + (1-1)*\log(1-0.9) \} \\ &= 0.105 \end{align}
- Actual class = 1, Predicted Probability = 0.1 \begin{align} \text{error} &= - \{ 1 * \log(0.1) + (1-1)* \log(1-0.1) \} \\ &= 2.302 \end{align}
3. Optimizer
The optimizer finds the optimal values of parameters \( \theta_0, \theta_1, ..., \theta_k \) to minimize the average cost. In this section, we focus on the gradient descent optimizer in the context of logistic regression.
Gradient descent consists of three main steps:
-
Parameter Initialization
All parameters are initialized, typically to zero for simplicity.
-
Gradient Calculation
Gradients are computed using the following formulas:
\begin{align} \frac{\partial J}{\partial \theta_0} &= \frac{1}{m} * \sum_{i=1}^m (\hat{y_i} - y_i) \\ \frac{\partial J}{\partial \theta_1} &= \frac{1}{m} * \sum_{i=1}^m (\hat{y_i} - y_i) * X_1 \\ \\ . \\ . \\ \frac{\partial J}{\partial \theta_k} &= \frac{1}{m} * \sum_{i=1}^m (\hat{y_i} - y_i) * X_k \\ \end{align} -
Updating Model Parameters
Parameters are updated using the learning rate \( \alpha \):
\begin{align} \theta_0 &= \theta_0 - \alpha * \frac{\partial J}{\partial \theta_0} \\ \theta_1 &= \theta_1 - \alpha * \frac{\partial J}{\partial \theta_1} \\ \\ . \\ . \\ \theta_k&=\theta_k - \alpha*\partial \theta_k \end{align}
Bringing It All Together
We have examined the model equation and objective function used in Logistic Regression.
We then explored how Gradient Descent iteratively determines the model parameters.
The entire process is summarized in the following algorithm.
initialize: a1=0, a2=0, ..., ak=0, b=0
n = 1000 (number of iterations)
for (i = 1 to n)
{
compute gradients:
da1, da2, ..., da_k, db
update parameters:
a1 = a1 - alpha * da1
a2 = a2 - alpha * da2
.
.
.
ak = ak - alpha * da_k
b = b - alpha * db
}