logistic_regression/mle

MLE estimate for Logistic Regression

What You Will Learn in This Section
  • Derivation of Logistic Regression Cost Function using MLE

In logistic regression, we model the probability that a given input x belongs to class 1 as follows:

P(y=1|x;θ)=hθ(x) P(y=0|x;θ)=1hθ(x)

This can be rewritten in a more compact form:

p(y|x;θ)=(hθ(x))y(1hθ(x))1y

Likelihood Function

Given that the training examples are independent, the likelihood of the parameters is given by:

L(θ)=p(y|X;θ)=i=1np(y(i)|x(i);θ)

Substituting the probability formula:

L(θ)=i=1nhθ(x(i))y(i)(1hθ(x(i)))1y(i)

Log-Likelihood Function

Since it is easier to work with the log of the likelihood function, we take the log-likelihood:

(θ)=logL(θ)=i=1n[y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))]

Negative Log-Likelihood as Cost Function

The cost function for logistic regression is chosen as the negative log-likelihood, also known as binary cross-entropy loss:

J(θ)=1ni=1n[y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))]

This function ensures that the predicted probabilities align well with the actual labels, and minimizing it leads to optimal parameter values for logistic regression.