Show Menu

AP Statistics Unit 2 Cheat Sheet (DRAFT) by

AP Statistics Unit 2 Study Guide

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Terms to Know

display the relati­onship between two numerical variables
correl­ation coeffi­cient "­r"
the strength, direction, and linear relati­onship between the x-variable and y-variable
least square regression line
line of best fit for the scatte­rplot; minimizes the sum of the square of the deviations from a line
explan­atory variable
explains the other variable; causes the response variable to change
response variable
response to the other variable; dependant
not right; using LSRL to predict values outside of the range of the original data set
points that are far away from the LSRL relative to other points
influe­ntial points
points that signif­icantly impacts the slope of the LSRL
lurking variable
different outside variables that causes both x and y to change
y - ŷ
coeffi­cient of determ­ination "­r^2­"
r^2% of the variation in y-variable can be explained by the approx­imate linear relati­onship between x-variable and y-variable

Strength of "­r" (Corre­lation Coeffi­cient)

legitimate values
(-0.5,0) U (0, 0.5)
(-0.8, -0.5) U (0.5, 0.8)
[-1, -0.8) U 90.8, 1]

LSRL Example

Desiree is interested to see if students who consume more caffeine tend to study more as well. She randomly selects 202020 students at her school and records their caffeine intake (mg) and the number of hours spent studying. A scatte­rplot of the data showed a linear relati­onship.

This is computer output from a least-­squares regression analysis on the data.

LSRL Example Interp­ret­ations

find the LSRL
ŷ = 2.544 + 0.164x
identify the variables
x = amount of caffeine intake (mg); y = number hours spent studying
interpret the slope
when the amount of caffeine intake increases by one, the number of hours spent studying increase by 0.164
identify the coeffi­cient of determ­ination
r^2 = 60.032
interpret the coeffi­cient of determ­ination
60.032% of the variation in the amount of hours spent studying can be explained by the approx­imate linear relati­onship with caffeine intake
find the correl­ation coeffi­cient
r = 0.7748
interpret the correl­ation coeffi­cient
there is a moderately strong, positive, linear relati­onship between the intake of caffeine and the amount of time spent studying


slope of LSRL
for each increase in the "­x-v­ari­abl­e" of one "­x-u­nit­", there is a predicted "­inc­rea­se/­dec­rea­se" in the "­y-v­ari­abl­e" of "b consta­nt" "­y-u­nit­s"
correl­ation coeffi­cient
there is a "­str­eng­th", "­dir­ect­ion­", linear relati­onship between "­x-v­ari­abl­e" and "­y-v­ari­abl­e"
correl­ation of determ­ination
"­r^2­"% of the variation in the "­y-v­ari­abl­e" can be explained by the approx­imate linear relati­onship between "­x-v­ari­abl­e" and "­y-v­ari­abl­e"
the actual "­y-v­ari­abl­e" is "­res­idu­al" "­y-u­nit­" "­abo­ve/­bel­ow" the predicted "­y-v­ari­abl­e"

Residuals and Residual Plots

the sum of the residual is always zero
error = observed - predicted
residual plots show if the model is approp­riate or not between two variables
if there is no pattern between the points on the residual plot, the model is approp­riate
if there is a pattern between the points on the residual plot, the model is not approp­riate
when the residual plot is not approp­riate, you can transform the data points until the plot turns random

Residual Plot Examples

the top residual plot is approp­riate because the points are random while the bottom residual plot is not approp­riate because there is a pattern between the points

Non-Linear Transform Data

x & log y
log x & log y
x & sqrt y
x & 1/y

Correl­ation Doesn't Imply Causation

If we collect data for the total number of Master’s degrees issued by univer­sities each year and the total box office revenue generated by year, we would find that the two variables are highly correl­ated.

Correl­ation Doesn't Imply Causation Explan­ation

Does this mean that issuing more Master’s degrees is causing the box office revenue to increase each year? Not quite. The more likely explan­ation is that the global population has been increasing each year, which means more Master’s degrees are issued each year and the sheer number of people attending movies each year are both increasing in roughly equal amounts. Although these two variables are correl­ated, one does not cause the other.