logistic_regression/mle

What You Will Learn in This Section

Derivation of Logistic Regression Cost Function using MLE

In logistic regression, we model the probability that a given input \( x \) belongs to class 1 as follows:

\[ P(y = 1 | x; \theta) = h_\theta(x) \] \[ P(y = 0 | x; \theta) = 1 - h_\theta(x) \]

This can be rewritten in a more compact form:

\[ p(y | x; \theta) = (h_\theta(x))^y (1 - h_\theta(x))^{1 - y} \]

Likelihood Function

Given that the training examples are independent, the likelihood of the parameters is given by:

\[ L(\theta) = p(\mathbf{y} | X; \theta) = \prod_{i=1}^{n} p(y^{(i)} | x^{(i)}; \theta) \]

Substituting the probability formula:

\[ L(\theta) = \prod_{i=1}^{n} h_\theta(x^{(i)})^{y^{(i)}} (1 - h_\theta(x^{(i)}))^{1 - y^{(i)}} \]

Log-Likelihood Function

Since it is easier to work with the log of the likelihood function, we take the log-likelihood:

\[ \ell(\theta) = \log L(\theta) = \sum_{i=1}^{n} \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] \]

Negative Log-Likelihood as Cost Function

The cost function for logistic regression is chosen as the negative log-likelihood, also known as binary cross-entropy loss:

\[ J(\theta) = - \frac{1}{n} \sum_{i=1}^{n} \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] \]

This function ensures that the predicted probabilities align well with the actual labels, and minimizing it leads to optimal parameter values for logistic regression.