linear_regression/multiple_linear_regression

Multiple Variable Linear Regression

Key Takeaways from this Section
  • Mathematical Understanding of Linear Regression Model with Multiple Variables

Up until now, we've explored linear regression using a single variable. However, in real-world scenarios, regression models typically involve multiple independent variables. Now, let's examine how mathematical equations adapt when dealing with multiple independent variables.

Our dataset (X,y) consists of X with dimensions (N,k) and y with dimensions (N,1), where N denotes the number of data points and k represents the number of features in X.

We will analyze the three primary mathematical aspects of Linear Regression.

1. Model Representation
This defines the relationship between independent variables X and the dependent variable y. Since we have k features, our model will include K+1 parameters: y^=a1x1+a2x2+...+akxk+b Here, a1,a2,...,ak are weights, and b is the bias term.

2. Cost Function
We will utilize the same Least Squares error cost function as before. However, in this case, it incorporates k+1 parameters: J(a1,a2,...,ak,b)=i=1N(yiyi^)22NJ(a1,a2,...,ak,b)=i=1N(yi(a1x1+a2x2+...+akxk+b))22N

3. Optimization Process
We employ gradient descent as the optimization algorithm for linear regression, consisting of three steps:

  • Initialization

    We begin by assigning random values to the parameters. For instance, we can initialize a1=0,a2=0,...,ak=0,b=0.

  • Gradient Computation

    Since we have k+1 parameters, we compute the gradient of the cost function with respect to each: a1=i=1N(yiyi^)(x1i)Na2=i=1N(yiyi^)(x2i)N...ak=i=1N(yiyi^)(xki)Nb=i=1N(yiyi^)(1)N

  • Parameter Update

    The optimizer updates the parameters using the computed gradients: a1=a1α×a1a2=a2α×a2..ak=akα×akb=bα×b

Bringing It All Together

We have examined the model equation and objective function used in Linear Regression. We then explored how Gradient Descent iteratively determines the model parameters.
The entire process is summarized in the following algorithm.

            
                initialize: a1=0, a2=0, ..., ak=0, b=0 
                n = 1000 (number of iterations)      
                for (i = 1 to n) 
                { 
                    compute gradients: 
                    da1, da2, ..., da_k, db 
                    update parameters:  
                        a1 = a1 - alpha * da1 
                        a2 = a2 - alpha * da2
                        .
                        . 
                        . 
                        ak = ak - alpha * da_k
                        b = b - alpha * db  
                }