We want to find a set of which best fits the equation:
, given the m training examples’ input values are
, which is a matrix.
Notice that It is rather than , since we may have a non-zero intercept term, so we add to every training example.
And the target values are
, which is a vector.
We want to find a set of such that is minimized.
Perform the matrix derivative to :
Set the derivative to zero, hence