Hypothesis testing cheatsheet Cheat Sheet by

This is a cheat sheet that provides a basic introduction and summaries of different hypothesis testing in stats


Statis­tical hypothesis
Statis­tical hypothesis testing
a hypothesis that is testable on the basis of observed data modeled as the realized values taken by a collection of random variables
a statis­tical way of testing the assumption regarding a popular parameter

steps of formul­ating a hypoth­esis

1. state the two hypoth­­esis: Null hypoth­­esis and Alte­­rn­ative hypoth­­esis
2. set the sign­­if­i­cance levels usually α = 0.05
3. carrying out the hypothesis testing and calculate the test statistics and corres­­po­nding P-va­­lue
4. compare P-value with signif­icance levels and then decide to accept or reject null hypothesis

Errors in Testing

Error Types
correct infere­nce
Type I error
Reject null when null is true
α = P(Type I error)
1 - α (signi­ficance level)
Type II error
Not reject null when null is false
β = P(Type II error)
1 - β (= power)

Chi-Square Test

Test for indepe­ndence
tests for the indepe­ndence of two catego­rical variables
Homoge­neity of Variance
test if more than two subgroups of a population share the same multiv­ariate distri­bution
goodness of fit
whether a multin­omial model for the population distri­bution (P1,....Pm) fits our data
Test for indepe­ndence and homoge­neity of variance share the same test statistics and degree of freedoms by different design of experiment

1. one or two catego­rical variables
2. indepe­ndent observ­ations
3. outcomes mutually exclusive
4. large n and no more than 20% of expected counts < 5


Anova Analysis
comparing the means of two or more continuous popula­tions
One-way layout
A test that allows one to make compar­isons between the means of two or more groups of data.
two-way layout
A test that allows one to make compar­isons between the means of two or more groups of data, where two indepe­ndent variables are consid­ered.
Assump­tions about data:
1. each data y is normally distri­buted
2. the variance of each treatment group is same
3. all observ­ations are indepe­ndent


Two Sample T-test
If two indepe­ndent groups have different mean
Paired T-test
if one groups have different means at different times
One Sample T-test
mean of a single group against a known mean
Assump­tions about data
1. indepe­ndent
2. normally distri­buted
3. have a similar amount of variance within each group being compared

One sample T-test

m = the mean of sample
s = standard deviation of sample
degree of freedom = n - 1

Paired T-test statistics

m = the mean of differ­ences between two paired sets of data
n = size of differ­ences
s = the standard deviation of differ­ences between two paired sets of data
degree of freedom = n - 1

Indepe­ndent two-sample T-test statistics

m = the means of group A and B respec­tively
n = the sizes of group A and B respec­tively
degrees of freedom = nA + nB - 2 (given two samples have the same variance)

Test of indepe­ndence and Homoge­neity of variance

Er,c = (Nr * Nc)/n
df = (r - 1) * (c - 1)
c = column number
r = row number

Goodness of fit test

O = observed value of data
E = expected value of data
k = dimension of parameter
df = n -1 - k

Carrying out one-way anova test

total variance
sum(Yij - overall mean of Y)2
intra-­group variance
sum(mean of each observ­ations across different treatments - mean of each treatm­ent)2
inter-­group variance
sum(mean of each treatments - overall mean of Y)2
Null hypoth­esis: the differ­ent­iated effect in each treatment group is 0
Altern­ative hypoth­esis: not all differ­ent­iated effect is 0


test statis­tics:

Fi-1,i­(j-1) = SSB/(I­-1)­/SS­W/I­(J-1)

I = number of different treatments
J = number of observ­ations within each treatment


