Show Menu
Cheatography

Statistics Cheat Sheet (DRAFT) by

This is a cheat sheet of all the concept that I've learned in my intro to Stats class

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Basics of Statis­tic­s~­Def­ini­tions

Statistics ~ Oxford Dictionary
"The science of collecting and analysing numerical data in large quanti­ties, especially for the purpose of inferring... from those in a repres­ent­ative sample."
Statistics ~ simple
The practice of collecting data from a small(er) number of indivi­duals to draw conclu­sions about a large(er) number of indivi­duals.
Descri­ptive Statistics
This involves organi­zing, and summar­izing data through numerical, tabular, and graphical means. Estimation Procedures are formulas used to obtain estimates (i.e. descri­ptive statistics) of population traits from the sample data.
Inference Procedures
The formulas used to quantify the magnitude of error in genera­­lizing from a ‘part’ (sample) to the ‘whole’ (popul­­ation) e.g., *t-test, *Z-test, *chi-s­quare test, *Wald test, etc. The Inference "­Sta­tis­tic­s" obtained may reflect the reliab­ility of the result or the possib­ility of error.
Inference
A conclusion reached based on evidence and reasoning. In statis­tics, we learn things from samples and infer to the population.
Population
The entire set of indivi­duals to which the study are to be extrap­olated (gener­alized).
Parameter
This is a numerical summary of a population.
Sample
A subset of the population
Sampling Plan
The method­­ology for choosing the sample.
Statistic
This is a numerical summary of a sample.

Types of estimates

Point
Interval

Types of Study Design

• Randomized Controlled Trials • Cohort Studies • Case-C­ontrol Studies • Cross-­Sec­tional Studies

Framework of a Typical Study

Explor­atory Data Analysis

 
The first step in dealing with data is to organize your thinking about the data.
Explor­atory data analysis
This is the process of using statis­tical tools and ideas to examine data in order to describe their main features.
Steps of an explor­atory data analysis
• Examine each variable first. • Then study the relati­onships among the variables. • Begin with a graph or graphs. • Add numerical summaries of specific aspects of the data

Variables

The charac­ter­istics of the indivi­duals within the population

Variables Tree

 

All about Variables

Catego­rical Data or Variables
Data can only take on discrete values. It allows for classi­fic­ation of indivi­duals based on some attribute or charac­ter­istic.
Nominal Data or Variables
apply in name only – no inherent ordering (e.g. blood type, hair color).
Ordinal Data or Variables
Data can be ranked in order but only take on discrete values (e.g. satisf­action score, Glasgow Coma Score).
Contin­uou­s/M­easured Data or Variables
These values can be added or subtracted and provide meaningful results.
Interval Data or Variables
The difference between each number­/value is equal (e.g. temper­ature in Celsius, IQ score). No absolute 0 (Zero has a meaningful value)
Ratio Data or Variables
These values are on an interval scale with an absolute zero with no meaningful value (e.g. weight, temper­ature in Kelvin)

Graphing Variables

Catego­rical-Pie Chart
This shows the distri­bution of a catego­rical variable as a “pie” whose slices are sized by the counts or percent for the catego­ries.
Catego­rical-Bar Chart
This represents each category as a bar whose heights show the category counts or percent.
Continuous-Histo­grams
Count how many (or the percentage of indivi­duals) that fall into each interval)
Continuous-Boxplot
• The Lower Inner Fence (LIF) • LIF = Q1 – 1.5IQR • The Upper Inner Fence (UIF) • UIF= Q3 + 1.5IQR • The Lower Adjacent Value • The actual data value just inside the LIF • The Upper Adjacent Value • The actual data value just inside the UIF

Tables­-Co­nti­nuo­us/­cat­ego­rical variables

Advantages :)
• Gives the reader a compact and structured synthesis of inform­ation • Shows a lot of detail in a small amount of space
Disadv­antages :(
• Because the reader only sees numbers, the table may not be readily understood without comparing it with other tables

Skew

Skewness

Right or positively skewed distri­butions will yield skewness values > 0
Left or negatively skewed distri­butions will yield skewness values < 0

Kurtosis

This is often made in comparison to a Bell Shaped­/Normal distri­bution
Kurtosis is a measure of how “peaked” or “flat” a distri­bution is
– If a distri­bution has an excessive amount of observ­ations close to the mean (meaning fewer observ­ations in the tails) it will have a more peaked appearance (Platy­kurtic Kurtosis < 3)
– If a distri­bution has an excessive amount of observ­ations which are further away from the mean (ex. in the tails) it will have a flattened appearance as if it really has no tails at all (Lepto­kurtic Kurtosis > 3)

So, about sampli­ng...

Probab­­ility sampling
Random select­ion­-equal chance for selection for every member of the population
Non-pr­­ob­a­b­ility sampling
Conven­­ience or voluntary self-s­­el­e­ction increases the likelyhood of selection of one or more partic­ipants
Parametric tests
In these tests, reasonable and eviden­ce-­sup­ported assump­tions must be made about the distri­bution. They can be used to make strong statis­­tical inferences when data are collected using probab­­ility sampling.
Non-pa­­ra­m­etric tests
Very few assump­tions are made, if any, about the population distri­bution. They are more approp­­riate for non-pr­­ob­a­b­ility samples, but they result in weaker inferences about the popula­­tion.
Signif­­icance level (alpha)
The risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
Statis­­tical power
The probab­­ility of your study detecting an effect of a certain size if there is one, usually 80% or higher.
Expected effect size
A standa­­rdized indication of how large the expected result of your study will be, usually based on other similar studies.

So, about sampli­ng...

Probab­­ility sampling
Random select­ion­-equal chance for selection for every member of the population
Non-pr­­ob­a­b­ility sampling
Conven­­ience or voluntary self-s­­el­e­ction increases the likelyhood of selection of one or more partic­ipants
Parametric tests
In these tests, reasonable and eviden­ce-­sup­ported assump­tions must be made about the distri­bution. They can be used to make strong statis­­tical inferences when data are collected using probab­­ility sampling.
Non-pa­­ra­m­etric tests
Very few assump­tions are made, if any, about the population distri­bution. They are more approp­­riate for non-pr­­ob­a­b­ility samples, but they result in weaker inferences about the popula­­tion.
Signif­­icance level (alpha)
The risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
Statis­­tical power
The probab­­ility of your study detecting an effect of a certain size if there is one, usually 80% or higher.
Expected effect size
A standa­­rdized indication of how large the expected result of your study will be, usually based on other similar studies.

So, about sampli­ng...

Probab­­ility sampling
Random select­ion­-equal chance for selection for every member of the population
Non-pr­­ob­a­b­ility sampling
Conven­­ience or voluntary self-s­­el­e­ction increases the likelyhood of selection of one or more partic­ipants
Parametric tests
In these tests, reasonable and eviden­ce-­sup­ported assump­tions must be made about the distri­bution. They can be used to make strong statis­­tical inferences when data are collected using probab­­ility sampling.
Non-pa­­ra­m­etric tests
Very few assump­tions are made, if any, about the population distri­bution. They are more approp­­riate for non-pr­­ob­a­b­ility samples, but they result in weaker inferences about the popula­­tion.
Signif­­icance level (alpha)
The risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
Statis­­tical power
The probab­­ility of your study detecting an effect of a certain size if there is one, usually 80% or higher.
Expected effect size
A standa­­rdized indication of how large the expected result of your study will be, usually based on other similar studies.

So, about sampli­ng...

Probab­­ility sampling
Random select­ion­-equal chance for selection for every member of the population
Non-pr­­ob­a­b­ility sampling
Conven­­ience or voluntary self-s­­el­e­ction increases the likelyhood of selection of one or more partic­ipants
Parametric tests
In these tests, reasonable and eviden­ce-­sup­ported assump­tions must be made about the distri­bution. They can be used to make strong statis­­tical inferences when data are collected using probab­­ility sampling.
Non-pa­­ra­m­etric tests
Very few assump­tions are made, if any, about the population distri­bution. They are more approp­­riate for non-pr­­ob­a­b­ility samples, but they result in weaker inferences about the popula­­tion.
Signif­­icance level (alpha)
The risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
Statis­­tical power
The probab­­ility of your study detecting an effect of a certain size if there is one, usually 80% or higher.
Expected effect size
A standa­­rdized indication of how large the expected result of your study will be, usually based on other similar studies.