Real data almost never falls in a perfectly straight line. ie. Real data rarely has a perfectly linear relationship. As such real data has errors which could be...
- Measurement Errors: Continuous Variables cannot be measured with 100% accuracy.
- An effect of variables not included in the model
- Natural variability.
We should incorporate them into our simple linear regression models. eg.
y
i
= β
0
+ β
1
x
i
+ e
i
where e
i
is the error on the ith case
and
y
i
= β
0
+ β
1
x
i
is the true regression line
*
Assumptions about errors:
We make these assumptions as we need them to...
- prove the optimaity ofthe estimates for β
0
and β
1
- prove the confidence intervals for β
0
and β
1
e
i
~ NID(0,σ
2)
- N: Normally distributed with mean 0
- I: Independent variables
- D: Distributed.
- σ
2: Common Variance.
- "e
i
is normally distributed with mean 0 and common variance of σ
2"
These assumption can also be expressed in terms of "Co-Variance"
E(e
i
) = 0, var(e
i
) = σ
2, cov(e
i
,e
j
) = 0, for i ≠ j
- "Expected value e
i
is 0, variance is σ
2, covariance of e
i
and e
j
is 0 where i is not j"
Combined with the normality assumptions, this implies e
i
s are independent.
Assumptions must be verified when applying to a regression model.