Key Takeaways from this Section
- Mathematical Understanding of Linear Regression Model with Multiple Variables
Up until now, we've explored linear regression using a single variable. However, in real-world scenarios,
regression models typically involve multiple independent variables. Now, let's examine how mathematical
equations adapt when dealing with multiple independent variables.
Our dataset \( (X, y) \) consists of \(X\) with dimensions \( (N,k) \) and \(y\) with dimensions \( (N,1) \),
where \(N\) denotes the number of data points and \(k\) represents the number of features in \(X\).
We will analyze the three primary mathematical aspects of Linear Regression.
1. Model Representation
This defines the relationship between independent variables \(X\) and the dependent variable \(y\).
Since we have \(k\) features, our model will include \(K+1\) parameters:
\begin{align}
\hat{y} = a_1 x_1 + a_2 x_2 + ... + a_k x_k + b
\end{align}
Here, \(a_1, a_2, ..., a_k\) are weights, and \(b\) is the bias term.
2. Cost Function
We will utilize the same Least Squares error cost function as before. However, in this case, it incorporates
\(k+1\) parameters:
\begin{align}
J(a_1, a_2, ..., a_k, b) &= \frac{\sum_{i=1}^{N} (y_i - \hat{y_i})^2}{2N} \\
J(a_1, a_2, ..., a_k, b) &= \frac{\sum_{i=1}^{N} (y_i - (a_1 x_1 + a_2 x_2 + ... + a_k x_k + b))^2}{2N}
\end{align}
3. Optimization Process
We employ gradient descent as the optimization algorithm for linear regression, consisting of three steps:
-
Initialization
We begin by assigning random values to the parameters. For instance, we can initialize \( a_1 = 0, a_2 = 0, ..., a_k = 0, b = 0 \).
-
Gradient Computation
Since we have \(k+1\) parameters, we compute the gradient of the cost function with respect to each: \begin{align} \partial a_1 &= \frac{\sum_{i=1}^{N} (y_i - \hat{y_i}) (-x_{1i})}{N} \\ \partial a_2 &= \frac{\sum_{i=1}^{N} (y_i - \hat{y_i}) (-x_{2i})}{N} \\ .\\ .\\ .\\ \partial a_k &= \frac{\sum_{i=1}^{N} (y_i - \hat{y_i}) (-x_{ki})}{N} \\ \partial b &= \frac{\sum_{i=1}^{N} (y_i - \hat{y_i}) (-1)}{N} \\ \end{align}
-
Parameter Update
The optimizer updates the parameters using the computed gradients: \begin{align} a_1 &= a_1 - \alpha \times \partial a_1 \\ a_2 &= a_2 - \alpha \times \partial a_2 \\ . \\ .\\ a_k &= a_k - \alpha \times \partial a_k \\ b &= b - \alpha \times \partial b \end{align}
Bringing It All Together
We have examined the model equation and objective function used in Linear Regression.
We then explored how Gradient Descent iteratively determines the model parameters.
The entire process is summarized in the following algorithm.
initialize: a1=0, a2=0, ..., ak=0, b=0
n = 1000 (number of iterations)
for (i = 1 to n)
{
compute gradients:
da1, da2, ..., da_k, db
update parameters:
a1 = a1 - alpha * da1
a2 = a2 - alpha * da2
.
.
.
ak = ak - alpha * da_k
b = b - alpha * db
}