Python – Fitting a Linear Model by the Normal Equation


First, we have to prepare a data file. The data are generated with the following equation:

y = 4.0+3.0X_{1}+X_{2}+\delta

, where \delta\sim N(0,1).

It is expected to get \theta very close to (4.0,3.0,1.0)

import numpy as np
import numpy.random as rnd

X1 = 2.0*rnd.rand(100,1)
X2 = 2.0*rnd.rand(100,1)
y = 4.0 + 3.0*X1 + X2 + rnd.randn(100,1)

with open("data.dat","w") as f:
    for i in range(len(X1)):

rnd.randn(100,1) generates a 100\times1 array of random numbers with standard normal distribution.

Plot the data by gnuplot:

set xlabel "x1"
set ylabel "x2"
set zlabel "y"
splot "data.dat"


Python implementation:

import numpy as np
import numpy.linalg as linalg

# Load data
data = np.loadtxt(fname="data.dat",dtype=np.float32)
# Create X_0 term
X0 = np.array([[1.0 for _ in range(len(data))]])
X1 = np.array([[ele[0] for ele in data]])
X2 = np.array([[ele[1] for ele in data]])
X = np.concatenate((X0,X1,X2),axis=0).T
y = np.array([[ele[2] for ele in data]]).T
theta = np.matmul(linalg.inv(np.matmul(X.T,X)),np.matmul(X.T,y))

np.matmul(X,Y) performs the matrix multiplication of 2 matrices X and Y, and linalg.inv(X) returns the inverse of the matrix X.

stdout: [[4.08673693], [2.82390453], [1.13030646]]

We can view the best fitting plane by typing this in gnuplot:

f(x,y) = 4.08673693+2.82390453*x+1.13030646*y
replot f(x,y)




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s