Basics of Statistics~Definitions
Statistics ~ Oxford Dictionary 
"The science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring... from those in a representative sample." 
Statistics ~ simple 
The practice of collecting data from a small(er) number of individuals to draw conclusions about a large(er) number of individuals. 
Descriptive Statistics 
This involves organizing, and summarizing data through numerical, tabular, and graphical means. Estimation Procedures are formulas used to obtain estimates (i.e. descriptive statistics) of population traits from the sample data. 
Inference Procedures 
The formulas used to quantify the magnitude of error in generalizing from a ‘part’ (sample) to the ‘whole’ (population) e.g., *ttest, *Ztest, *chisquare test, *Wald test, etc. The Inference "Statistics" obtained may reflect the reliability of the result or the possibility of error. 
Inference 
A conclusion reached based on evidence and reasoning. In statistics, we learn things from samples and infer to the population. 
Population 
The entire set of individuals to which the study are to be extrapolated (generalized). 
Parameter 
This is a numerical summary of a population. 
Sample 
A subset of the population 
Sampling Plan 
The methodology for choosing the sample. 
Statistic 
This is a numerical summary of a sample. 
Types of Study Design
• Randomized Controlled Trials • Cohort Studies • CaseControl Studies • CrossSectional Studies 
Framework of a Typical Study
Exploratory Data Analysis

The first step in dealing with data is to organize your thinking about the data. 
Exploratory data analysis 
This is the process of using statistical tools and ideas to examine data in order to describe their main features. 
Steps of an exploratory data analysis 
• Examine each variable first. • Then study the relationships among the variables. • Begin with a graph or graphs. • Add numerical summaries of specific aspects of the data 
Variables
The characteristics of the individuals within the population 


All about Variables
Categorical Data or Variables 
Data can only take on discrete values. It allows for classification of individuals based on some attribute or characteristic. 
Nominal Data or Variables 
apply in name only – no inherent ordering (e.g. blood type, hair color). 
Ordinal Data or Variables 
Data can be ranked in order but only take on discrete values (e.g. satisfaction score, Glasgow Coma Score). 
Continuous/Measured Data or Variables 
These values can be added or subtracted and provide meaningful results. 
Interval Data or Variables 
The difference between each number/value is equal (e.g. temperature in Celsius, IQ score). No absolute 0 (Zero has a meaningful value) 
Ratio Data or Variables 
These values are on an interval scale with an absolute zero with no meaningful value (e.g. weight, temperature in Kelvin) 
Graphing Variables
CategoricalPie Chart 
This shows the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percent for the categories. 
CategoricalBar Chart 
This represents each category as a bar whose heights show the category counts or percent. 
ContinuousHistograms 
Count how many (or the percentage of individuals) that fall into each interval) 
ContinuousBoxplot 
• The Lower Inner Fence (LIF) • LIF = Q1 – 1.5IQR • The Upper Inner Fence (UIF) • UIF= Q3 + 1.5IQR • The Lower Adjacent Value • The actual data value just inside the LIF • The Upper Adjacent Value • The actual data value just inside the UIF 
TablesContinuous/categorical variables
Advantages :) 
• Gives the reader a compact and structured synthesis of information • Shows a lot of detail in a small amount of space 
Disadvantages :( 
• Because the reader only sees numbers, the table may not be readily understood without comparing it with other tables 
Skewness
Right or positively skewed distributions will yield skewness values > 0 
Left or negatively skewed distributions will yield skewness values < 0 
Kurtosis
This is often made in comparison to a Bell Shaped/Normal distribution 
Kurtosis is a measure of how “peaked” or “flat” a distribution is 
– If a distribution has an excessive amount of observations close to the mean (meaning fewer observations in the tails) it will have a more peaked appearance (Platykurtic Kurtosis < 3) 
– If a distribution has an excessive amount of observations which are further away from the mean (ex. in the tails) it will have a flattened appearance as if it really has no tails at all (Leptokurtic Kurtosis > 3) 
So, about sampling...
Probability sampling 
Random selectionequal chance for selection for every member of the population 
Nonprobability sampling 
Convenience or voluntary selfselection increases the likelyhood of selection of one or more participants 
Parametric tests 
In these tests, reasonable and evidencesupported assumptions must be made about the distribution. They can be used to make strong statistical inferences when data are collected using probability sampling. 
Nonparametric tests 
Very few assumptions are made, if any, about the population distribution. They are more appropriate for nonprobability samples, but they result in weaker inferences about the population. 
Significance level (alpha) 
The risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%. 
Statistical power 
The probability of your study detecting an effect of a certain size if there is one, usually 80% or higher. 
Expected effect size 
A standardized indication of how large the expected result of your study will be, usually based on other similar studies. 
So, about sampling...
Probability sampling 
Random selectionequal chance for selection for every member of the population 
Nonprobability sampling 
Convenience or voluntary selfselection increases the likelyhood of selection of one or more participants 
Parametric tests 
In these tests, reasonable and evidencesupported assumptions must be made about the distribution. They can be used to make strong statistical inferences when data are collected using probability sampling. 
Nonparametric tests 
Very few assumptions are made, if any, about the population distribution. They are more appropriate for nonprobability samples, but they result in weaker inferences about the population. 
Significance level (alpha) 
The risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%. 
Statistical power 
The probability of your study detecting an effect of a certain size if there is one, usually 80% or higher. 
Expected effect size 
A standardized indication of how large the expected result of your study will be, usually based on other similar studies. 
So, about sampling...
Probability sampling 
Random selectionequal chance for selection for every member of the population 
Nonprobability sampling 
Convenience or voluntary selfselection increases the likelyhood of selection of one or more participants 
Parametric tests 
In these tests, reasonable and evidencesupported assumptions must be made about the distribution. They can be used to make strong statistical inferences when data are collected using probability sampling. 
Nonparametric tests 
Very few assumptions are made, if any, about the population distribution. They are more appropriate for nonprobability samples, but they result in weaker inferences about the population. 
Significance level (alpha) 
The risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%. 
Statistical power 
The probability of your study detecting an effect of a certain size if there is one, usually 80% or higher. 
Expected effect size 
A standardized indication of how large the expected result of your study will be, usually based on other similar studies. 
So, about sampling...
Probability sampling 
Random selectionequal chance for selection for every member of the population 
Nonprobability sampling 
Convenience or voluntary selfselection increases the likelyhood of selection of one or more participants 
Parametric tests 
In these tests, reasonable and evidencesupported assumptions must be made about the distribution. They can be used to make strong statistical inferences when data are collected using probability sampling. 
Nonparametric tests 
Very few assumptions are made, if any, about the population distribution. They are more appropriate for nonprobability samples, but they result in weaker inferences about the population. 
Significance level (alpha) 
The risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%. 
Statistical power 
The probability of your study detecting an effect of a certain size if there is one, usually 80% or higher. 
Expected effect size 
A standardized indication of how large the expected result of your study will be, usually based on other similar studies. 
