# Fitting a Linear Model by the Normal Equation

We want to find a set of $\theta_{i}$ which best fits the equation:

$y=\theta_{0}+\theta_{1}x_{1}+...+\theta_{n}x_{n}$

, given the m training examples’ input values are

$\begin{bmatrix}-(x^{(1)})^{T}-\\ -(x^{(2)})^{T}-\\ \vdots \\ -(x^{(m)})^{T}-\end{bmatrix}$

, which is a $m\times(n+1)$ matrix.

Notice that It is $n+1$ rather than $n$, since we may have a non-zero intercept term, so we add $x^{(i)}_{0}=1$ to every training example.

And the target values are

$\begin{bmatrix}y^{(1)}\\ y^{(2)}\\ \vdots \\ y^{(m)}\end{bmatrix}$

, which is a $m\times1$ vector.

We want to find a set of $\theta_{i}$ such that $J(\theta)=\frac{1}{2}(X\theta-\vec{y})^{T}(X\theta-\vec{y})$ is minimized.

Perform the matrix derivative to $J(\theta)$:

$\nabla_{\theta}J(\theta)=X^{T}X\theta-X^{T}\vec{y}$

Set the derivative to zero, hence

$\theta=(X^{T}X)^{-1}X^{T}\vec{y}$