Cheatography

# AP Stat 01 Describe Data Cheat Sheet (DRAFT) by taotao

This is a draft cheat sheet. It is a work in progress and is not finished yet.

### Collect Data

 type of variable catego­rical vs quanti­tative (conti­nuous vs discrete) type of descri­ptive methods tabular, graphic, Numberical Tabular n, f, rf, 100rf,cf, rcf, 100rcf graphical relations: bar, pie, dot, stem leave, histogram, cumulative freq numberic precis­e/i­nfe­rence, dull, compli­cated

### graph the data

 qualit­ative bar pie quanti­tative dot plot,s­tem­plot; histog­ram­,cu­mul­atice freq charts, boxplot Examining graph center­(me­an,­med­ian­,mode), spread (range, std, variance), shape (symme­tric, skewed) patter­n/d­evi­ations cluste­r/gap, outliers dotplot spread­,sh­ape­,approx center stemplot shaped­,sp­read, center histograph f vs rf, shape/­center, large dataset, error bar for spread cumulative freq charts S shaped, T(skewed) shape, meaningful order

### Central tendency - mean, median, mode

 summering distri­bution popula­tio­n/s­ample, center­/sp­rea­d/sape mean mu=pop­ulation mean,X bar=sample mean median for skewed data, odd/even sample size mode number with highest freq symmet­rical mean=m­edi­an=mode left skewed mode>m­edi­an>mean right skewed mode>m­ean­>media

### varian­ce/­spread - range, IQR, STD

 variance spread from mean range larges­t-s­mallest measur­ement, outliers affect IQR interq­uartile range, eg Q3-Q1, not affected by outliers, median / IQR STD standard deviation, square root of variance, outlier affect, >=0 variance average the square of deviation from mean population variance N, sigma, mu sample variance n-1, x bar, s
mean/STD, median/IQR

### Position - quarti­le,­per­cen­til­e,s­tan­darded score

 percentile order, divide into 100 equal parts, count kth pernce­ntile Pk quartile order, divide into 4 equal parts (median calc), count kth quartile Qk, P25=Q1, P50=Q2.. z score standa­rdized score, (x-mea­n)/std, compare datasets with different scales, eg temper­ature in north vs south city

### Graphing uni variant data

 graphical summaries Y scale:­mis­leading manipu­lation box plots box(Q2-Q3) and whiske­rs(­lower Q1,upper Q4), whiskers <1.5IQR (Q3-Q1), L=Q1-1.5IQR, U=Q3+1.5IQR. point >U or
Avoid simple list the stat ( center,std and shape), instead, make a clear compar­ative statement.

### Bivariant data

 Scatter plot shape: linear, non-li­near, no relation direction: positive or negative linear relation strength of linear relation: close to the line Numeric methods correl­ation coeffi­cient degree and direction of linear relation of two quanti­tative variables (x,y) rho and r, [-1,+1] 0, 0.1, 0.5,0.85,1 least squares regression line formular Y = a+ bX +e Y depend­ent­/re­sponse variable x indepe­nde­nt/­exp­lan­atory variable a y intercept of line b slope of the line e random error, residual error predicted vaue y hat residual error e least square regression minimize the sum of squares of residual error line of best fit (X bar, Y bar), slope=­r(S­y/Sx) coeffi­cient of determ­ination R squared, percent of variance of Y determined by variance X [-1,+1] influe­ntial point point that affect the correl­ation efficient Outlier maybe influe­ntial point residual plot should be random, or else, fit is not the best transf­orm­ation to fit linear log, sqrt,r­eci­pro­cal­,sq­uar­e,power
1 calc slope, intercept, write formula, plot the linear line
2 make a predic­tion, calc residual error
3 calc coeffi­cient of determ­ination r = SSxy/s­qrt­(SSxx * SSyy)

stat and interp­ret­ation

### catego­rical data

 marginal and joint freq of two way tables contin­gen­cy(­joint) table r*c marginal row col grant total condit­ional relative frequency associ­ation compare with row total * col total /grand total