linear_regression/multiple_linear

Key Takeaways from this Section

Mathematical Understanding of Linear Regression Model with Multiple Variables

Up until now, we've explored linear regression using a single variable. However, in real-world scenarios, regression models typically involve multiple independent variables. Now, let's examine how mathematical equations adapt when dealing with multiple independent variables.

Our dataset \( (X, y) \) consists of \(X\) with dimensions \( (N,k) \) and \(y\) with dimensions \( (N,1) \), where \(N\) denotes the number of data points and \(k\) represents the number of features in \(X\).

We will analyze the three primary mathematical aspects of Linear Regression.

1. Model Representation
This defines the relationship between independent variables \(X\) and the dependent variable \(y\). Since we have \(k\) features, our model will include \(K+1\) parameters: \begin{align} \hat{y} = a_1 x_1 + a_2 x_2 + ... + a_k x_k + b \end{align} Here, \(a_1, a_2, ..., a_k\) are weights, and \(b\) is the bias term.

2. Cost Function
We will utilize the same Least Squares error cost function as before. However, in this case, it incorporates \(k+1\) parameters: \begin{align} J(a_1, a_2, ..., a_k, b) &= \frac{\sum_{i=1}^{N} (y_i - \hat{y_i})^2}{2N} \\ J(a_1, a_2, ..., a_k, b) &= \frac{\sum_{i=1}^{N} (y_i - (a_1 x_1 + a_2 x_2 + ... + a_k x_k + b))^2}{2N} \end{align}

3. Optimization Process
We employ gradient descent as the optimization algorithm for linear regression, consisting of three steps:

Initialization

We begin by assigning random values to the parameters. For instance, we can initialize \( a_1 = 0, a_2 = 0, ..., a_k = 0, b = 0 \).
Gradient Computation

Since we have \(k+1\) parameters, we compute the gradient of the cost function with respect to each: \begin{align} \partial a_1 &= \frac{\sum_{i=1}^{N} (y_i - \hat{y_i}) (-x_{1i})}{N} \\ \partial a_2 &= \frac{\sum_{i=1}^{N} (y_i - \hat{y_i}) (-x_{2i})}{N} \\ .\\ .\\ .\\ \partial a_k &= \frac{\sum_{i=1}^{N} (y_i - \hat{y_i}) (-x_{ki})}{N} \\ \partial b &= \frac{\sum_{i=1}^{N} (y_i - \hat{y_i}) (-1)}{N} \\ \end{align}
Parameter Update

The optimizer updates the parameters using the computed gradients: \begin{align} a_1 &= a_1 - \alpha \times \partial a_1 \\ a_2 &= a_2 - \alpha \times \partial a_2 \\ . \\ .\\ a_k &= a_k - \alpha \times \partial a_k \\ b &= b - \alpha \times \partial b \end{align}

Bringing It All Together

We have examined the model equation and objective function used in Linear Regression. We then explored how Gradient Descent iteratively determines the model parameters.
The entire process is summarized in the following algorithm.

            
                initialize: a1=0, a2=0, ..., ak=0, b=0 
                n = 1000 (number of iterations)      
                for (i = 1 to n) 
                { 
                    compute gradients: 
                    da1, da2, ..., da_k, db 
                    update parameters:  
                        a1 = a1 - alpha * da1 
                        a2 = a2 - alpha * da2
                        .
                        . 
                        . 
                        ak = ak - alpha * da_k
                        b = b - alpha * db  
                }

Linear Regression

Linear Regression

Multiple Variable Linear Regression

Key Takeaways from this Section

Bringing It All Together