Show Menu
Cheatography

PSCY3000 Cheat Sheet (DRAFT) by

This is a draft cheat sheet. It is a work in progress and is not finished yet.

F- Distri­bution

Skewed right
Mean is 1
Only non negative values

Gathering Data

Treatm­ents: experi­mental conditions which correspond to assigned values of explan­atory variable.
Observ­ational studies: watch and observe values on response variable (non experi­mental)
Advantage of Experi­ments over Observ­ational Studies: experi­ments reduce potential for lurking variable by random selection, also experiment is only way to determine causality
Sample Survey: selects sample from population and gathers data
Sampling Frame: list of subjects in the population from which the sample is taken
Simple random sampling: when each possible sample of that size has the same chance of being selected
To select a simple random sample: number the subjects in the sampling frame using numbers of the same length (number of digits). select numbers of that length from a table of random numbers or using a random number generator. include in the sample those subjects having numbers equal to the random numbers selected.
Margin of Error tells us how well the sample estimate predicts the population percen­tage. Ex. A survey results says margin of error is +/- 3% MEANS "it is very likely that the reported sample percentage is no more than #5 lower/­higher than the population percentage
Bias: When certain outcomes will occur more often in the sample than they do in the popula­tion.
Sampling bias occurs from using nonrandom samples or having underc­ove­rage.
Nonres­ponse bias occurs when some sampled subjects cannot be reached or refuse to partic­ipate or fail to answer some questions.
Response bias occurs when the subject gives an incorrect response (perhaps lying) or the way the interv­iewer asks the questions (or wording of a question in print) is confusing or mislea­ding.
Large Sample doesn't guarantee unbiased sample
Conven­ience Sample problems: results only apply to observed subjects, unlikely to be repres­ent­ative of popula­tion, often severe biases result
Key Parts of a Sample Survey­:Id­entify the population of all subjects of interest. Construct a sampling frame which attempts to list all subjects in the popula­tion. Use a random sampling design to select n subjects from the sampling frame. Be cautious of sampling bias due to nonrandom samples (such as volunteer samples) and sample underc­ove­rage, response bias from subjects not giving their true response or from poorly worded questions, and nonres­ponse bias from refusal of subjects to partic­ipate.
3 Components of Good Experi­ment:
1. Control Group - placebo, allows to analyze effect­iveness
2. Random­ization - eliminates researcher bias, balances comparison groups on known and lurking variables
3. Replic­ation: allows to attribute observed effects to tx rather than regular variab­ility
Statis­tically Signif­icant: if observed difference is larger than would be expected by chance
Can generalize only to population repres­ented by sample
 

14.1 One-Way Anova: Comparing Several Means

One way ANOVA is an ANOVA with a single factor
Factor: categorial explan­atory variable
Test analyzes whether differ­ences observed among the sample means could have reasonably occurred by chance, if the null hypothesis of equal population means were true
Evidence against the null is stronger when the variab­ility between sample means increases and as the sample sizes increase
Assump­tions and the effects of violating them:
Population distri­butions are normal (Moderate violations of the normality assumption are not serious.) These distri­butions have the same standard deviation. (Moderate violations are not serious.) The data resulted from random­iza­tion.
Misleading results may occur with the F-test if the distri­butions are highly skewed and the sample size N is small.
Misleading results may also occur with the F-test if there are relatively large differ­ences among the standard deviations (the largest sample standard deviation being more than double the smallest one).
Several T-Tests vs. F-test: If separate t tests are used, the signif­icance level applies to each individual compar­ison, not the overall type I error rate for all the compar­isons. However, the F test does not tell us which groups differ or how different they are.

One Way ANOVA example

Question: Three groups, with different French skills, scored on one quiz
Assump­tions
Indepe­ndent Random Samples, normal population distri­butions with equal standard deviations
Hypotheses
H0: u1=u2=u3 Ha: at least two population means are unequal
Test statistic
F= btwn groups variab­ili­ty/­within groups variab­ility
df1 = (g-1) df2 = (N-g)
P-Value
Right tail probab­ility of above observed F value
Conclusion
Interpret in context, reject Ho based on p-value being below or = signif­icant value

14.2

Confidence Intervals Comparing Pairs of Means
s is square root of within groups variance estimate (s2)
For 95% Confidence Interval comparing means ui - uj: when the confidence interval does NOT containt 0, we can infer the population means are different, the interval shows just how different they may be
Example: for comparing the very happy and pretty happy catego­ries, the confidence interval for u1 - u2 = (0.7, 5.3)
Since the CI contains only positive numbers, this suggests that on average people who are very happy have more friends than people who are pretty happy
Effects of violating assump­tions
When the sample sizes are large and the ratio of the largest standard deviation to the smallest is less than 2, these procedures are robust to violations of these assump­tions.
If the ratio of the largest standard deviation to the smallest exceeds 2, use the confidence interval formulas that use separate standard deviations for the groups.
Tukey Multiple Comparison
Ex. Groups: (Very Happy, Pretty Happy) Difference of Means: (u1-u2) 95% CI: (0.7, 5.3) Tukey 95% Multiple Comparison (0.3, 5.7)
The Tukey intervals hold with an overall confidence level of 95%, this confidence applies to all intervals. Tukey is wider than separate CI's because uses a higher confidence level to achieve 95% for all intervals.
The Tukey Confidence interval for u1-u2 contains only positive values so infer that u1>u2, mean number of good friends higher for very happy than pretty happy (but maybe barely so).
ANOVA and Regression









 

Two -Way ANOVA

Difference Between 1 and 2 way ANOVA
One way analyzes relati­onship between mean of quanti­tative response variable and groups that are categories of a factor
Two Way ANOVA analyzes quanti­tative response variable on two catego­rical response variables
Null Hypothesis In two-way ANOVA, a null hypothesis states that the population means are the same in each category of one factor, at each fixed level of the other factor.
Ex. Ho: Mean corn yield is equal for plots at the low and high levels of manure, for each fixed level of fertil­izer. From the output, you can obtain the F-test statistic of 6.88 with its corres­ponding P-value of 0.018. The small P-value indicates strong evidence that the mean corn yield depends on manure level.
No intera­ction between two factors means that the effect of either factor on the response variable is the same at each category of the other factor.
Usually test hypothesis that there is no intera­ction first
If the evidence of intera­ction is not strong (that is, if the P-value is not small), then test the main effects hypotheses and/or construct confidence intervals for those effects.

Repeated Measures ANOVA

Sum of Squares in One Way Repeated Measures
Indepe­ndent Groups:
SS Groups (df = g-1)
SS Error (df = N - g)
Dependent Groups
SS Groups (df = g-1)
SS subjects (df = subj - 1)
SS error (df = n-g-su­bj.+1)
In repeated measures (dependent groups) ANOVA, the variab­ility of the subjects is calculated (as if it was a factor) and is not included in the error sums of squares.
A very important assumption underlying repeated measures ANOVA is sphericity and, relatedly, compound symmetry. When either of these assump­tions are violated, the P-values tend to be too small. A Greenh­ous­e-G­eisser adjustment to the dfs will accomm­odate for any potential violations of this assump­tion.
Two-factor studies often have different (i.e., indepe­ndent) samples on one of the factors and the same (i.e., dependent) samples on the other factor. The factor with different groups of subjects is called the “betwe­en-­sub­jects” factor and the factor with repeated measures is called the “withi­n-s­ubj­ects” factor.