Show Menu
Cheatography

AP Stat 01 Describe Data Cheat Sheet (DRAFT) by

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Collect Data

type of variable
catego­rical vs quanti­tative (conti­nuous vs discrete)
type of descri­ptive methods
tabular, graphic, Numberical
Tabular
n, f, rf, 100rf,cf, rcf, 100rcf
graphical
relations: bar, pie, dot, stem leave, histogram, cumulative freq
numberic
precis­e/i­nfe­rence, dull, compli­cated

graph the data

qualit­ative
bar pie
quanti­tative
dot plot,s­tem­plot; histog­ram­,cu­mul­atice freq charts, boxplot
Examining graph
center­(me­an,­med­ian­,mode), spread (range, std, variance), shape (symme­tric, skewed)
patter­n/d­evi­ations
cluste­r/gap, outliers
dotplot
spread­,sh­ape­,approx center
stemplot
shaped­,sp­read, center
histograph
f vs rf, shape/­center, large dataset, error bar for spread
cumulative freq charts
S shaped, T(skewed) shape, meaningful order

Central tendency - mean, median, mode

summering distri­bution
popula­tio­n/s­ample, center­/sp­rea­d/sape
mean
mu=pop­ulation mean,X bar=sample mean
median
for skewed data, odd/even sample size
mode
number with highest freq
symmet­rical
mean=m­edi­an=mode
left skewed
mode>m­edi­an>mean
right skewed
mode>m­ean­>media
 

varian­ce/­spread - range, IQR, STD

variance
spread from mean
range
larges­t-s­mallest measur­ement, outliers affect
IQR
interq­uartile range, eg Q3-Q1, not affected by outliers, median / IQR
STD
standard deviation, square root of variance, outlier affect, >=0
variance
average the square of deviation from mean
population variance
N, sigma, mu
sample variance
n-1, x bar, s
mean/STD, median/IQR

Position - quarti­le,­per­cen­til­e,s­tan­darded score

percentile
order, divide into 100 equal parts, count kth pernce­ntile Pk
quartile
order, divide into 4 equal parts (median calc), count kth quartile Qk, P25=Q1, P50=Q2..
z score
standa­rdized score, (x-mea­n)/std, compare datasets with different scales, eg temper­ature in north vs south city

Graphing uni variant data

graphical summaries
Y scale:­mis­leading manipu­lation
box plots
box(Q2-Q3) and whiske­rs(­lower Q1,upper Q4), whiskers <1.5IQR (Q3-Q1), L=Q1-1.5IQR, U=Q3+1.5IQR. point >U or <L are outliers
 
based on position, identify outlier and general shape(­skewed or not)
 
calc: Q1, Median, Q3, IQR, L, U
shift unit +a
(varia­nce­/sp­read) range, std,IQR not affected
enlarge or shrink unit,*b
all stat enlarged or shrinked
Compare distri­butions
center, spread, shape
 
outerlier or unusual values­,cl­ust­er/gap
 
context of the question
 
dot plot, stemplot, histogram, freq polygram
Avoid simple list the stat ( center,std and shape), instead, make a clear compar­ative statement.
 

Bivariant data

Scatter plot
shape: linear, non-li­near, no relation
 
direction: positive or negative linear relation
 
strength of linear relation: close to the line
Numeric methods
correl­ation coeffi­cient
degree and direction of linear relation of two quanti­tative variables (x,y)
 
rho and r, [-1,+1]
 
0, 0.1, 0.5,0.85,1
least squares regression line
formular
Y = a+ bX +e
Y
depend­ent­/re­sponse variable
x
indepe­nde­nt/­exp­lan­atory variable
a
y intercept of line
b
slope of the line
e
random error, residual error
predicted vaue
y hat
residual error
e
least square regression
minimize the sum of squares of residual error
 
line of best fit (X bar, Y bar), slope=­r(S­y/Sx)
coeffi­cient of determ­ination
R squared, percent of variance of Y determined by variance X
 
[-1,+1]
influe­ntial point
point that affect the correl­ation efficient
Outlier
maybe influe­ntial point
residual plot
should be random, or else, fit is not the best
transf­orm­ation to fit linear
log, sqrt,r­eci­pro­cal­,sq­uar­e,power
1 calc slope, intercept, write formula, plot the linear line
2 make a predic­tion, calc residual error
3 calc coeffi­cient of determ­ination r = SSxy/s­qrt­(SSxx * SSyy)

stat and interp­ret­ation

catego­rical data

marginal and joint freq of two way tables
contin­gen­cy(­joint) table
r*c
marginal
row col grant total
condit­ional relative frequency
associ­ation
compare with row total * col total /grand total