Cheatography

# AP Statistics Unit 2 Cheat Sheet (DRAFT) by kayheartsuu

AP Statistics Unit 2 Study Guide

This is a draft cheat sheet. It is a work in progress and is not finished yet.

### Terms to Know

 scatte­rplot display the relati­onship between two numerical variables correl­ation coeffi­cient "­r" the strength, direction, and linear relati­onship between the x-variable and y-variable least square regression line line of best fit for the scatte­rplot; minimizes the sum of the square of the deviations from a line explan­atory variable explains the other variable; causes the response variable to change response variable response to the other variable; dependant extrap­olation not right; using LSRL to predict values outside of the range of the original data set outliers points that are far away from the LSRL relative to other points influe­ntial points points that signif­icantly impacts the slope of the LSRL lurking variable different outside variables that causes both x and y to change residual y - ŷ coeffi­cient of determ­ination "­r^2­" r^2% of the variation in y-variable can be explained by the approx­imate linear relati­onship between x-variable and y-variable

### Strength of "­r" (Corre­lation Coeffi­cient)

 legitimate values [-1,1] none 0 weak (-0.5,0) U (0, 0.5) moderate (-0.8, -0.5) U (0.5, 0.8) strong [-1, -0.8) U 90.8, 1]

### LSRL Example

Desiree is interested to see if students who consume more caffeine tend to study more as well. She randomly selects 202020 students at her school and records their caffeine intake (mg) and the number of hours spent studying. A scatte­rplot of the data showed a linear relati­onship.

This is computer output from a least-­squares regression analysis on the data.

### LSRL Example Interp­ret­ations

 find the LSRL ŷ = 2.544 + 0.164x identify the variables x = amount of caffeine intake (mg); y = number hours spent studying interpret the slope when the amount of caffeine intake increases by one, the number of hours spent studying increase by 0.164 identify the coeffi­cient of determ­ination r^2 = 60.032 interpret the coeffi­cient of determ­ination 60.032% of the variation in the amount of hours spent studying can be explained by the approx­imate linear relati­onship with caffeine intake find the correl­ation coeffi­cient r = 0.7748 interpret the correl­ation coeffi­cient there is a moderately strong, positive, linear relati­onship between the intake of caffeine and the amount of time spent studying

### Interp­ret­ations

 slope of LSRL for each increase in the "­x-v­ari­abl­e" of one "­x-u­nit­", there is a predicted "­inc­rea­se/­dec­rea­se" in the "­y-v­ari­abl­e" of "b consta­nt" "­y-u­nit­s" correl­ation coeffi­cient there is a "­str­eng­th", "­dir­ect­ion­", linear relati­onship between "­x-v­ari­abl­e" and "­y-v­ari­abl­e" correl­ation of determ­ination "­r^2­"% of the variation in the "­y-v­ari­abl­e" can be explained by the approx­imate linear relati­onship between "­x-v­ari­abl­e" and "­y-v­ari­abl­e" residual the actual "­y-v­ari­abl­e" is "­res­idu­al" "­y-u­nit­" "­abo­ve/­bel­ow" the predicted "­y-v­ari­abl­e"

### Residuals and Residual Plots

 the sum of the residual is always zero error = observed - predicted residual plots show if the model is approp­riate or not between two variables if there is no pattern between the points on the residual plot, the model is approp­riate if there is a pattern between the points on the residual plot, the model is not approp­riate when the residual plot is not approp­riate, you can transform the data points until the plot turns random

### Residual Plot Examples

the top residual plot is approp­riate because the points are random while the bottom residual plot is not approp­riate because there is a pattern between the points

### Non-Linear Transform Data

 x & log y log x & log y x & sqrt y x & 1/y

### Correl­ation Doesn't Imply Causation

If we collect data for the total number of Master’s degrees issued by univer­sities each year and the total box office revenue generated by year, we would find that the two variables are highly correl­ated.

### Correl­ation Doesn't Imply Causation Explan­ation

 Does this mean that issuing more Master’s degrees is causing the box office revenue to increase each year? Not quite. The more likely explan­ation is that the global population has been increasing each year, which means more Master’s degrees are issued each year and the sheer number of people attending movies each year are both increasing in roughly equal amounts. Although these two variables are correl­ated, one does not cause the other.