Sum of Squared Residuals 
(y − Xβˆ)′(y − Xβˆ) 

y′y − βˆ′X′y − y′Xβˆ + βˆ′X′Xβˆ 

y′y − 2βˆ′X′y + βˆ′X′Xβˆ 
Minimise the SSR 
∂(SSR)/∂βˆ = −2X′y + 2X′Xβˆ = 0 
from the minimum we get: "normal equation" 
(X′X)βˆ = X′y 
Solve for OLS estimator βˆ; by pre multiplying both sides by (X′X) 
(X′X)−1(X′X)βˆ = (X′X)−1X′y 
by definition, (X′X)−1(X′X) = I 
Iβˆ = (X′X)−1X′y 

βˆ = (X ′ X )−1 X ′ y 
Properties 
The observed values of X are uncorrelated with the residuals. 
X′e = 0 implies that for every column xk of X, x′ke = 0. 
substitute in y = Xβˆ + e into normal equation 
(X′X)βˆ = X′(Xβˆ + e) 

(X′X)βˆ = (X′X)βˆ + X′e 

X′e = 0 
The sum of the residuals is zero. 
If there is a constant, then the first column in X (i.e. X1) will be a column of ones. This means that for the first element in the X′e vector (i.e. X11 ×e1 +X12 ×e2 +...+X1n ×en) to be zero, it must be the case that ei = 0. 
The sample mean of the residuals is zero. 
e= ∑e i/n = 0. 
The regression hyperplane passes through the means of the observed values (X and y). 
This follows from the fact that e = 0. Recall that e = y − Xβˆ. Dividing by the number of observations, we get e = y − xβˆ = 0. This implies that y = xβˆ. This shows that the regression hyperplane goes through the point of means of the data. 
The predicted values of y are uncorrelated with the residuals. 
ˆ′e = (Xβˆ)′e = b′X′e = 0 
The mean of the predicted Y’s for the sample will equal the mean of the observed Y’s : y^=y 
The GaussMarkov Theorem: Proof that βˆ is an unbiased estimator of β 
βˆ = (X′X)^{−1}X′y=(X′X)^{−1}X′(Xβ + ε) 

β + (X′X)^{−1}X′ε 
given (X′X)^{−1}X′X = I 
E[βˆ] = E[β] + E[(X′X)^{−1}X′ε] = β + (X′X)^{−1}X′E[ε] 
where E[X′ε]=0 
E[βˆ]=β 
Proof that βˆ is a linear estimator of β. 
βˆ = β + (X′X)^{−1}X′ε; where (X′X)^{−1}X′= A 

βˆ = β + Aε => linear equation 