Fitting a Linear Model by the Normal Equation

We want to find a set of \theta_{i} which best fits the equation:

y=\theta_{0}+\theta_{1}x_{1}+...+\theta_{n}x_{n}

, given the m training examples’ input values are

\begin{bmatrix}-(x^{(1)})^{T}-\\ -(x^{(2)})^{T}-\\ \vdots \\ -(x^{(m)})^{T}-\end{bmatrix}

, which is a m\times(n+1) matrix.

Notice that It is n+1 rather than n, since we may have a non-zero intercept term, so we add x^{(i)}_{0}=1 to every training example.

And the target values are

\begin{bmatrix}y^{(1)}\\ y^{(2)}\\ \vdots \\ y^{(m)}\end{bmatrix}

, which is a m\times1 vector.

We want to find a set of \theta_{i} such that J(\theta)=\frac{1}{2}(X\theta-\vec{y})^{T}(X\theta-\vec{y}) is minimized.

Perform the matrix derivative to J(\theta):

\nabla_{\theta}J(\theta)=X^{T}X\theta-X^{T}\vec{y}

Set the derivative to zero, hence

\theta=(X^{T}X)^{-1}X^{T}\vec{y}

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s