Cheatography

# Statistics I Cheat Sheet (DRAFT) by Nathaliemayor

This is a draft cheat sheet. It is a work in progress and is not finished yet.

### explor­atory data analysis

 types of variables: Catego­rical (nominal no order ex color of eyes or ordinal order ex.lvl of education variables)/ Numerical variables: discrete and continous variables numerical summaries quantile: value that proportion p the data is smaller than Q(p) and 1-p bigger first quantile Q1: p=0.25, median Q2: p=0.5 and third quantile: p=0.75 Q3, IQR is the interq­uartile range = Q3-Q1 contains 50% of the data Formula for the rank is p(n-1)+1 if not integer extrap­olate with 2 values between with weight measures of center MODE: most frequent value MEDIAN: Q(0.50)/ MEAN: average, tot/n if unimodal and symetric distri­bution mean=m­edian, right skewed mode

### Statis­tical inference

 simpson paradox hetero­genous sources: divide to more homogenous subgroups: ex by major because could bias the proportion : contro­lling for the confou­nding factor men chose the easiest program whereas women chose the more difficult to enter: the solution is to use a weighted average of the admission rates sampling the population population: what we want to analyse, want to find the popula­tion's parame­ters, these are true and ifxed values but usually unknown sample: what we have, piece of the population chosen randomly, parameters are random variables, should be as large as possible to limit bias, sample have incomplete inform­ation, if finite population without replac­ement of sample can affect results point estimation estimators an estimator is a parametor calculated with the simple. it tries to estimate the true parameter of the population it is a random variables and parameter are fixed but unknown within a certain certitude: confidence intervals Estimator to estimate a parameter and its uncert­ainty: ex: μ, the more sampling, the more precise because variance decreases with N large concen­trated distri­bution around true value central limit thm when we sum random variables from the same distri­bution: sum/n= new variable that follow a normal distri­bution when n is large special case for proportion (binomial) estimating variance s^2 and s ̃2, if x follow a normal distrb­ution, follos khi 2 distri­bution with n-1 degrees of freedom similar to variance estimation confidence intervals from central limit thm: C is a certain value for with prob of (1-a) that the estimator is in the interval, small alfa, bigger interval, not exactly 95/100 but around value, prob, if normal distri­bution use student distri­bution so modify CI to be more precise, for propor­tions: ^p estimate mean for median for variance for the difference of means when 0 is not in the interval: signif­icante différence theory of estimation depends on situation, can evaluate the quality of estimator, good one has nu bias,