Cheatography

# Econometrics Cheat Sheet (DRAFT) by dsjac3

This is a draft cheat sheet. It is a work in progress and is not finished yet.

### properties of OLS Matrix

 Sum of Squared Residuals (y − Xβˆ)′(y − Xβˆ) y′y − βˆ′X′y − y′Xβˆ + βˆ′X′Xβˆ y′y − 2βˆ′X′y + βˆ′X′Xβˆ Minimise the SSR ∂(SSR)/∂βˆ = −2X′y + 2X′Xβˆ = 0 from the minimum we get: "­normal equati­on" (X′X)βˆ = X′y Solve for OLS estimator βˆ; by pre multip­lying both sides by (X′X) (X′X)−­1(X­′X)βˆ = (X′X)−1X′y by defini­tion, (X′X)−­1(X′X) = I Iβˆ = (X′X)−1X′y βˆ = (X ′ X )−1 X ′ y Properties The observed values of X are uncorr­elated with the residuals. X′e = 0 implies that for every column xk of X, x′ke = 0. substitute in y = Xβˆ + e into normal equation (X′X)βˆ = X′(Xβˆ + e) (X′X)βˆ = (X′X)βˆ + X′e X′e = 0 The sum of the residuals is zero. If there is a constant, then the first column in X (i.e. X1) will be a column of ones. This means that for the first element in the X′e vector (i.e. X11 ×e1 +X12 ×e2 +...+X1n ×en) to be zero, it must be the case that ei = 0. The sample mean of the residuals is zero. e= ∑e i/n = 0. The regression hyperplane passes through the means of the observed values (X and y). This follows from the fact that e = 0. Recall that e = y − Xβˆ. Dividing by the number of observ­ations, we get e = y − xβˆ = 0. This implies that y = xβˆ. This shows that the regression hyperplane goes through the point of means of the data. The predicted values of y are uncorr­elated with the residuals. ˆ′e = (Xβˆ)′e = b′X′e = 0 The mean of the predicted Y’s for the sample will equal the mean of the observed Y’s : y^-=y- The Gauss-­Markov Theorem: Proof that βˆ is an unbiased estimator of β βˆ = (X′X)−1X′y=(X′X)−1X′(Xβ + ε) β + (X′X)−1X′ε given (X′X)−1X′X = I E[βˆ] = E[β] + E[(X′X)−1X′ε] = β + (X′X)−1X′E[ε] where E[X′ε]=0 E[βˆ]=β Proof that βˆ is a linear estimator of β. βˆ = β + (X′X)−1X′ε; where (X′X)−1X′= A βˆ = β + Aε => linear equation

### Hetros­ked­ast­icity

 conseq­uence: the statistics used to test hypotheses under Gauss-­Markov assump­tions are not valid in the presence of hetros­ked­ast­icity. Valid estimator (any form) ∑[(x1- x-)2 uˆi2]/[SST2x] SSTx=∑(x1- x-)2 Robust Standard error Varˆ(β­ˆj)­=∑[rˆij2û2i]/[SSR2j]

### Inference

 Normality Assump­tion: zero mean and Variance Var(u)= σ2 T-test: (βˆ j- β j)/se(βˆ j)~ t n-k-1 =t df H0 : βj = 0 used in testing hypotheses about a single population parameter as in . Test statistic t βˆ j=(βˆ j)/se(βˆ j)~ t n-k-1 t = (estimate − hypoth­esised value)/ standard error Altern­ative Hypoth­esi­s/one sided H1: βj > 0 t βˆj > c [c @5%] H1: βj < 0 t βˆj <- c [c @5%] Two sided H1: βj =/= 0 |tβˆj | > c [c @2.5%] If H0, rejected x j is statis­tically signif­icant, (signi­fic­antly different from zero), @ the 5% level if H0, not rejected x j is statis­tically insign­ificant @the 5% level P-value smallest signif­icant level at which the null hypotheses would be rejected Confidence Interval βˆj ±c·se(βˆj) where c is 97.5 percentile in a t n-k-1 distri­bution CI given; @ 5% signif­icant level H0 :βj =aj is rejected against H1:βj = ̸=aj ; if aj is not in the 95% confidence interval H0:β1<β2 ⇔ β1−β2<0 t= (βˆ1−βˆ2) /se(βˆ1 − βˆ2) se(βˆ1 − βˆ2) = √Var(βˆ1 − βˆ2) Var(βˆ1 − βˆ2) = Var(βˆ1) + Var(βˆ2) − 2Cov(βˆ1, βˆ2) altern­ative to calcul­ating se(βˆ1 − βˆ2) Let θ = βˆ1 − βˆ2; β1 = θ + βˆ2 H0: θ=0, H1: θ<0 Substi­tuting β1 = θ + βˆ2 into the model we obtain β0 +θ x1 +β2(x1 +x2)+β3 x3 +u F Test F =[(SSRr-SSRur )/q] / [SSRur/(n-k-1)] q =number of restri­ctions n-k-1= df ur = df r- df ur R2 F stat SSR= SST(1 - R2 ) F= [(R2ur-R2r)/q] / [1-R2ur)/(df ur)] remember to not square the R value thats already been done Overall signif­icance of the regression Testing joint exclusion [R2/R]/[(1-R2)/(n-k-1)]

### Data Scaling

 Changes: if Xj is * by c Its coeffi­cient is / by c If dependant variable is * by c ALL OLS coeffi­cients are * by c neither t nor F statistics are affected Beta coeffi­cients obtained from an OLS regression after the dependant and indepe­ndent variables have been transf­ormed into z-scores

### Dummy Variables

 Dummy/Binary Variables = yes/no variables = take on the values 0 and 1 to identify the mutually exclusive classes of the explan­atory variables. = leads to regression models where the parameters have very natural interp­ret­ations Given: wage= β0+ ∂0 female + β1 edu + u To solve for ∂0: ∂0=E(w­age­|fe­mal­e,e­du)­-E(­wag­e|m­ale­,edu) where level of education is the same Graphi­cally ∂0 = an intercept shift male intercept= β0 female intercept= β0+∂0 dummy variable trap= when both dummy variables (male & female) are included; resulting in perfect collin­earity If a qualitative variable has m levels; then (m−1) dummy variables are required and each of them takes value 0 and 1. Hypothesis test Test whether the two regression models are identical: H0 :β2 =β3 0 H1 :β2 ≠0 and/or β3 ≠0. Acceptance of H0 indicates that only single model is necessary to explain the relati­onship. Test is two models differ with respect to intercepts only and they have same slopes H0 :β3 =0 H1:β3 ≠0. Treating a quanti­tative variable as qualit­ative variable increases the complexity of the model. The degrees of freedom for error are reduced. Can effect the inferences if data set is small