First, we have to prepare a data file. The data are generated with the following equation:

, where .

It is expected to get very close to

# data.py import numpy as np import numpy.random as rnd rnd.seed(2342) X1 = 2.0*rnd.rand(100,1) X2 = 2.0*rnd.rand(100,1) y = 4.0 + 3.0*X1 + X2 + rnd.randn(100,1) with open("data.dat","w") as f: for i in range(len(X1)): f.write("%f\t%f\t%f\n"%(X1[i],X2[i],y[i]))

rnd.randn(100,1) generates a array of random numbers with standard normal distribution.

Plot the data by gnuplot:

set xlabel "x1" set ylabel "x2" set zlabel "y" splot "data.dat"

Python implementation:

# normal_equation.py import numpy as np import numpy.linalg as linalg # Load data data = np.loadtxt(fname="data.dat",dtype=np.float32) # Create X_0 term X0 = np.array([[1.0 for _ in range(len(data))]]) X1 = np.array([[ele[0] for ele in data]]) X2 = np.array([[ele[1] for ele in data]]) X = np.concatenate((X0,X1,X2),axis=0).T y = np.array([[ele[2] for ele in data]]).T theta = np.matmul(linalg.inv(np.matmul(X.T,X)),np.matmul(X.T,y)) print(theta)

np.matmul(X,Y) performs the matrix multiplication of 2 matrices X and Y, and linalg.inv(X) returns the inverse of the matrix X.

stdout: [[4.08673693], [2.82390453], [1.13030646]]

We can view the best fitting plane by typing this in gnuplot:

f(x,y) = 4.08673693+2.82390453*x+1.13030646*y replot f(x,y)

GitHub: https://github.com/DropletX/Data-Analysis/tree/master/Linear-Model/Python