linear_regression/regression

What will you learn in this section?

We will explore different metrics that can be used in regression models.
We will discuss the advantages and disadvantages of each metric.

So far, we have discussed the components of a linear regression model. However, we have not yet addressed how to evaluate the quality of the trained model. In this section, we will examine several metrics that can be used to assess the effectiveness of a regression model. These metrics help differentiate between a well-performing and a poorly performing model.

1. Mean Squared Error (MSE)

MSE measures how well a fitted line represents the data. It is calculated using the following formula: \begin{align} MSE = \frac{\sum_{i=1}^N \ (y_i-\hat{y_i})^2}{N} \end{align} The numerator in the above formula represents the sum of squared residuals, and dividing by the total number of observations yields the average residual error. The table below illustrates an example of how to calculate MSE.

Table 1: MSE Calculation Steps
S.N.	\( y_{true} \)	\( y_{predicted}\)	Residual \((y_{true}-y_{predicted}) \)	Squared Residual \((y_{true}-y_{predicted})^2 \)
1	25	23	2	4
2	29	29.5	-0.5	0.25
3	40	42	-2	4
4	32	31	1	1
5	38	39.5	-1.5	2.25
Sum of squared residuals				11.5

\begin{align} MSE &= \frac{\text{Sum of squared residuals}}{\text{Number of data points}} \\ MSE &= \frac{11.5}{5} \\ MSE &= 2.3 \end{align}

Issues with MSE

Scale Dependency:

MSE is dependent on the scale of the variable. A higher scale for \( y \) results in a higher MSE, and vice versa. The tables below illustrate this with two datasets: one with a lower scale and another with a higher scale.
MSE (using left table data) = 2.3
MSE (using right table data) = 230

Table 2 data with lower scale of \( y \) variable
\( y_{true} \)	\( y_{predicted}\)
25	23
29	29.5
40	42
32	31
38	39.5

Table 3 data with higher scale of \( y \) variable
\( y_{true} \)	\( y_{predicted}\)
250	230
290	295
400	420
320	310
380	395

MSE has no upper bound. It can have any positive value. There is no defined benchmark against which we can compare it. In one case an MSE of 100 units might indicate a good model while in another case an MSE of 100 units may indicate the worst model.

2. \( R^2 \) Metric

The \( R^2 \) metric is a valuable measure used to evaluate regression models. It ranges from \( [0,1] \), with higher values indicating better model performance. Let's examine its definition and calculation. Below is the formula for \( R^2 \) calculation: \begin{align} R^2=1- \frac{\sum_{i=1}^N \ (y_i-\hat{y_i})^2 }{\sum_{i=1}^N (y_i-\overline{ y})^2} \end{align} Where:
\( \overline y \) = Mean value of the true \( y \)
\( y_i \) = Actual value
\( \hat y_i \) = Predicted value
\( \sum_{i=1}^N \ (y_i-\overline{y})^2 \) = Total variance in \( y \)
\( \sum_{i=1}^N (y_i-\hat y_i)^2 \) = Variance in \( y \) not captured by the model

Another interpretation of the \( R^2 \) metric
\( R^2 \) represents the proportion of variance explained by the model relative to the total variance in the dependent variable. The equations below illustrate this concept: \begin{align} R^2&=1- \frac{\text{Variance not captured by the model}}{\text{Total variance}} \\ \\ R^2&=\frac{\text{Total variance} - \text{Variance not captured by the model}}{\text{Total variance}} \\ \\ R^2&= \frac{\text{Variance explained by the model}}{\text{Total variance}} \end{align}
An example to better understand \( R^2 \)
Suppose we have a regression model with three independent variables, represented by the equation: \begin{align} \hat y = a_1 x_1 + a_2 x_2 + a_3 x_3 + b \end{align} After training the model, we obtain \( R^2 = 0.6 \). This means that 60% of the total variance in the dependent variable \( y \) is explained by the three independent variables in the model.

Limitations of \( R^2 \)
As we increase the number of variables in a model, the \( R^2 \) value will always increase. Consider the two models: \begin{align} \hat y &= a_1 x_1 + a_2 x_2 + a_3 x_3 + b \quad \text{(Model 1)} \\ \\ \hat y &= a_1 x_1 + a_2 x_2 + a_3 x_3 + a_4 x_4 + b \quad \text{(Model 2)} \end{align} Model 2 will always have a higher \( R^2 \) than Model 1, regardless of whether the newly added variable \( x_4 \) is actually useful. This means that \( R^2 \) does not account for overfitting. Adding more variables can cause the model to overfit the training data, but \( R^2 \) alone cannot determine if the additional variables are beneficial. A good metric should indicate whether adding a new variable genuinely improves the model's performance.

3. Adjusted \( R^2 \) Metric

Adjusted \( R^2 \) addresses the shortcomings of the \( R^2 \) metric. When a new variable is added to the model, if the improvement in performance is not significant, adjusted \( R^2 \) will decrease. This helps determine whether the newly added variable is meaningful. The formula for calculating adjusted \( R^2 \) is: \begin{align} \text{Adjusted } R^2 = 1 - \frac{(1-R^2) \cdot (n-1)}{(n-p-1)} \end{align} Where:
\( n \) = Number of data points
\( p \) = Number of independent variables in the model