Python – Fitting a Linear Model by the Normal Equation


First, we have to prepare a data file. The data are generated with the following equation:

y = 4.0+3.0X_{1}+X_{2}+\delta

, where \delta\sim N(0,1).

It is expected to get \theta very close to (4.0,3.0,1.0)

import numpy as np
import numpy.random as rnd

X1 = 2.0*rnd.rand(100,1)
X2 = 2.0*rnd.rand(100,1)
y = 4.0 + 3.0*X1 + X2 + rnd.randn(100,1)

with open("data.dat","w") as f:
    for i in range(len(X1)):

rnd.randn(100,1) generates a 100\times1 array of random numbers with standard normal distribution.

Plot the data by gnuplot:

set xlabel "x1"
set ylabel "x2"
set zlabel "y"
splot "data.dat"


Python implementation:

import numpy as np
import numpy.linalg as linalg

# Load data
data = np.loadtxt(fname="data.dat",dtype=np.float32)
# Create X_0 term
X0 = np.array([[1.0 for _ in range(len(data))]])
X1 = np.array([[ele[0] for ele in data]])
X2 = np.array([[ele[1] for ele in data]])
X = np.concatenate((X0,X1,X2),axis=0).T
y = np.array([[ele[2] for ele in data]]).T
theta = np.matmul(linalg.inv(np.matmul(X.T,X)),np.matmul(X.T,y))

np.matmul(X,Y) performs the matrix multiplication of 2 matrices X and Y, and linalg.inv(X) returns the inverse of the matrix X.

stdout: [[4.08673693], [2.82390453], [1.13030646]]

We can view the best fitting plane by typing this in gnuplot:

f(x,y) = 4.08673693+2.82390453*x+1.13030646*y
replot f(x,y)




Fitting a Linear Model by the Normal Equation

We want to find a set of \theta_{i} which best fits the equation:


, given the m training examples’ input values are

\begin{bmatrix}-(x^{(1)})^{T}-\\ -(x^{(2)})^{T}-\\ \vdots \\ -(x^{(m)})^{T}-\end{bmatrix}

, which is a m\times(n+1) matrix.

Notice that It is n+1 rather than n, since we may have a non-zero intercept term, so we add x^{(i)}_{0}=1 to every training example.

And the target values are

\begin{bmatrix}y^{(1)}\\ y^{(2)}\\ \vdots \\ y^{(m)}\end{bmatrix}

, which is a m\times1 vector.

We want to find a set of \theta_{i} such that J(\theta)=\frac{1}{2}(X\theta-\vec{y})^{T}(X\theta-\vec{y}) is minimized.

Perform the matrix derivative to J(\theta):


Set the derivative to zero, hence


sqlite3 (1) – Programming in an object-oriented style

SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. sqlite3 is a Python package that provides a SQL interface.

To program in an object-oriented style:

import sqlite3

class Database:
    def __init__(self,db):
        Connect to database or create a database if not exists.
        db: database file name.
        self.conn = sqlite3.connect(db)
        self.cursor = self.conn.cursor()

    def __del__(self):
        Close connection to database.

    def dbcommit(self):
        Write changes to database.

When an object is instantiated, it is connected to the database. db is the database file name, such as “stock_quotes.db”. If the file does not exist, a new database file will be created.



1) sqlite3 — DB-API 2.0 interface for SQLite databases

Fortran – Programming Primitive Vectors of Reciprocal Lattice

Suppose \vec{a_{1}}, \vec{a_{2}} and \vec{a_{3}} are the 3 primitive vectors of a direct lattice. The primitive vectors of the reciprocal lattice are given by

\vec{b_{1}}=2\pi\frac{\vec{a_{2}}\times \vec{a_{3}}}{\Omega_{a}}

\vec{b_{2}}=2\pi\frac{\vec{a_{3}}\times \vec{a_{1}}}{\Omega_{a}}

\vec{b_{3}}=2\pi\frac{\vec{a_{1}}\times \vec{a_{2}}}{\Omega_{a}}

, where \Omega_{a} is the volume of the primitive cell.

Fortran 90 code:

program main
  implicit none
  real(8), parameter :: PI=4.D0*datan(1.D0)
  real(8),dimension(3) :: a1,a2,a3,b1,b2,b3
  real(8),dimension(3) :: a,b
  real(8) :: Va

  print *,"Enter x, y, z coordinates of a1:"
  print *,"Enter x, y, z coordinates of a2:"
  print *,"Enter x, y, z coordinates of a3:"
  Va = dot_product(a1,cross_product(a2,a3))
  b1 = cross_product(a2,a3)*2*PI/Va
  b2 = cross_product(a3,a1)*2*PI/Va
  b3 = cross_product(a1,a2)*2*PI/Va
  print *,"b1(1)=",b1(1),"b1(2)=",b1(2),"b1(3)=",b1(3)
  print *,"b2(1)=",b2(1),"b2(2)=",b2(2),"b2(3)=",b2(3)
  print *,"b3(1)=",b3(1),"b3(2)=",b2(2),"b3(3)=",b2(3)
  print *,"Volume of Unit Cell=",Va
  function cross_product(x,y)
    implicit none
    real(8),dimension(3) :: x, y, z, cross_product
  end function cross_product

  real(8) function dot_product(x,y)
    implicit none
    real(8),dimension(3) :: x,y
    dot_product = x(1)*y(1)+x(2)*y(2)+x(3)*y(3)
  end function dot_product
end program main


The user inputs the 3 direct lattice vectors, and the program will print the 3 reciprocal lattice vectors and the volume of the direct lattice.