Cheatography

# Statistics Cheat Sheet (DRAFT) by igotanA

This is a cheat sheet of all the concept that I've learned in my intro to Stats class

This is a draft cheat sheet. It is a work in progress and is not finished yet.

### Basics of Statis­tic­s~­Def­ini­tions

 Statistics ~ Oxford Dictionary "The science of collecting and analysing numerical data in large quanti­ties, especially for the purpose of inferring... from those in a repres­ent­ative sample." Statistics ~ simple The practice of collecting data from a small(er) number of indivi­duals to draw conclu­sions about a large(er) number of indivi­duals. Descri­ptive Statistics This involves organi­zing, and summar­izing data through numerical, tabular, and graphical means. Estimation Procedures are formulas used to obtain estimates (i.e. descri­ptive statistics) of population traits from the sample data. Inference Procedures The formulas used to quantify the magnitude of error in genera­­lizing from a ‘part’ (sample) to the ‘whole’ (popul­­ation) e.g., *t-test, *Z-test, *chi-s­quare test, *Wald test, etc. The Inference "­Sta­tis­tic­s" obtained may reflect the reliab­ility of the result or the possib­ility of error. Inference A conclusion reached based on evidence and reasoning. In statis­tics, we learn things from samples and infer to the population. Population The entire set of indivi­duals to which the study are to be extrap­olated (gener­alized). Parameter This is a numerical summary of a population. Sample A subset of the population Sampling Plan The method­­ology for choosing the sample. Statistic This is a numerical summary of a sample.

### Types of estimates

 Point Interval

### Types of Study Design

 • Randomized Controlled Trials • Cohort Studies • Case-C­ontrol Studies • Cross-­Sec­tional Studies

### Explor­atory Data Analysis

 The first step in dealing with data is to organize your thinking about the data. Explor­atory data analysis This is the process of using statis­tical tools and ideas to examine data in order to describe their main features. Steps of an explor­atory data analysis • Examine each variable first. • Then study the relati­onships among the variables. • Begin with a graph or graphs. • Add numerical summaries of specific aspects of the data

### Variables

 The charac­ter­istics of the indivi­duals within the population

### Variables Tree

 Catego­rical Data or Variables Data can only take on discrete values. It allows for classi­fic­ation of indivi­duals based on some attribute or charac­ter­istic. Nominal Data or Variables apply in name only – no inherent ordering (e.g. blood type, hair color). Ordinal Data or Variables Data can be ranked in order but only take on discrete values (e.g. satisf­action score, Glasgow Coma Score). Contin­uou­s/M­easured Data or Variables These values can be added or subtracted and provide meaningful results. Interval Data or Variables The difference between each number­/value is equal (e.g. temper­ature in Celsius, IQ score). No absolute 0 (Zero has a meaningful value) Ratio Data or Variables These values are on an interval scale with an absolute zero with no meaningful value (e.g. weight, temper­ature in Kelvin)

### Graphing Variables

 Catego­rical-Pie Chart This shows the distri­bution of a catego­rical variable as a “pie” whose slices are sized by the counts or percent for the catego­ries. Catego­rical-Bar Chart This represents each category as a bar whose heights show the category counts or percent. Continuous-Histo­grams Count how many (or the percentage of indivi­duals) that fall into each interval) Continuous-Boxplot • The Lower Inner Fence (LIF) • LIF = Q1 – 1.5IQR • The Upper Inner Fence (UIF) • UIF= Q3 + 1.5IQR • The Lower Adjacent Value • The actual data value just inside the LIF • The Upper Adjacent Value • The actual data value just inside the UIF

### Tables­-Co­nti­nuo­us/­cat­ego­rical variables

 Advantages :) • Gives the reader a compact and structured synthesis of inform­ation • Shows a lot of detail in a small amount of space Disadv­antages :( • Because the reader only sees numbers, the table may not be readily understood without comparing it with other tables

### Skewness

 Right or positively skewed distri­butions will yield skewness values > 0 Left or negatively skewed distri­butions will yield skewness values < 0

### Kurtosis

 This is often made in comparison to a Bell Shaped­/Normal distri­bution Kurtosis is a measure of how “peaked” or “flat” a distri­bution is – If a distri­bution has an excessive amount of observ­ations close to the mean (meaning fewer observ­ations in the tails) it will have a more peaked appearance (Platy­kurtic Kurtosis < 3) – If a distri­bution has an excessive amount of observ­ations which are further away from the mean (ex. in the tails) it will have a flattened appearance as if it really has no tails at all (Lepto­kurtic Kurtosis > 3)

 Probab­­ility sampling Random select­ion­-equal chance for selection for every member of the population Non-pr­­ob­a­b­ility sampling Conven­­ience or voluntary self-s­­el­e­ction increases the likelyhood of selection of one or more partic­ipants Parametric tests In these tests, reasonable and eviden­ce-­sup­ported assump­tions must be made about the distri­bution. They can be used to make strong statis­­tical inferences when data are collected using probab­­ility sampling. Non-pa­­ra­m­etric tests Very few assump­tions are made, if any, about the population distri­bution. They are more approp­­riate for non-pr­­ob­a­b­ility samples, but they result in weaker inferences about the popula­­tion. Signif­­icance level (alpha) The risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%. Statis­­tical power The probab­­ility of your study detecting an effect of a certain size if there is one, usually 80% or higher. Expected effect size A standa­­rdized indication of how large the expected result of your study will be, usually based on other similar studies.