Show Menu
Cheatography

Inferential Statistics Cheat Sheet (DRAFT) by

Making generalization about population parameter based on the sample statistic

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Infere­ntial Statistics Pathway

Infere­ntial Statistics

Concept:
Making genera­liz­ation about population parameter based on the sample statistic
Examples
Is there a difference in partic­ipation in local decisi­on-­makings between low-income and middle­-income groups in New Jersey?
 
What are reactions of New Brunswick residents to new invest­ments of Rutgers University in develo­pment of College Avenue?
 
What is the level of effect­iveness of the Corona vaccine developed by Johnson and Johnson pharma­ceu­tical company?

Primary Date: Sampling Methods

1) Probab­ility Sampling
Concept:
Sample is selected randomly (Equal Probab­ility of Selection Method – EPSEM).
Methods:
Simple random / Systematic / Cluster / Stratified / Conven­ience
Applic­ation
Diverse population / Genera­liz­ation is required
2) Non-Pr­oba­bility Sampling
Concept:
Sample selection is based on the subjective judgment of resear­cher.
Methods:
Judgement / Snowball / Quota / Consec­utive
Applic­ation
Homogenous population / Pilot study

Probab­ility Sampling Methods

Simple Sampling
Select a simple random sample through drawing or use of random number methods
Systematic Sampling
1) Number all members of population sequen­tially. 2) From a starting point select every nth indivi­dual.
Cluster Sampling
1) Divide population into non-ov­erl­apping clusters. 2) Sample all in some clusters (Ex. Sample all nurses in 5 hospitals in New Jersey).
Stratified Sampling
1) Divide population into non-ov­erl­apping clusters. 2) Sample some in clusters (Ex. Sample some nurses in every hospitals in New Jersey).
Conven­ience Sampling
Create a sample by using data from population members that are readily available.

Non-Pr­oba­bility Sampling Methods

Judgement Sampling
Select samples based on resear­cher’s knowledge and if they fit to partic­ipate in the research - some subjects are more fit for the research compared to other indivi­duals.
Snowball Sampling
Once the researcher finds suitable subjects, he asks them for assistance to seek similar subjects to form a consid­erably good size sample - good or small popula­tion.
Quota Sampling
Select equal or propor­tionate subjects depending on basis of the quota, which usually are age, gender, education, race, religion and socioe­conomic status. (Ex. Sample size of 100, researcher can select 25 1st year students, 25 2nd year, 25 3rd year and 25 4th year students).
Consec­utive Sampling
Is very similar to conven­ience sampling except that it seeks to include ALL accessible subjects as part of the sample.

Data Biases

Moderator / Interv­iewer bias
The modera­tor’s facial expres­sions, body language, tone, manner of dress, and style of language may introduce bias. Similarly, the modera­tor’s age, social status, race, and gender can produce bias.
BIased Questions
A biased question influences respon­dents’ answers. And the way you ask a question, or “vague wording” can bias a question.
Biased Answers
A biased answer is an untrue or partially true statement, like 1) Nonres­ponse in survey, when indivi­duals can’t or refuse to respond, 2) Truthf­ulness of response, 3) Faulty recall or not remember accura­tely, 4) Voluntary response: indivi­duals with strong feelings about a subject are more likely than other to respond
Biased Samples
Poor screening and recruiting causes biased samples. Examples are biases of time and location
Biased Reporting
Experi­ences, beliefs, feelings, wishes, attitudes, culture, views, state of mind, reference, error, and person­ality can bias analysis and reporting.
 
Biased Questions

Estimation of Population Parameter

Sample Statis­tics:
Because of several constr­aints of collecting data from all indivi­duals in popula­tion, including limita­tions of time, money, and labor, we usually depend on the sample inform­ation.
Point of Estimate:
The value of the parameters are unknown, but sample statistic is available, and can be used to estimate the population value
Charac­ter­istics of Sample Statis­tics:
Sample should represent the population value
Sampling techniques
Sample size

Estimation of Population Value

Concept:
Estimation of population parameter based on a sample statistic
Example:
Rutgers Parking Authority collected data from a random sample of 220 Rutgers University students in order to estimate commuting time of all the university students.
 
What percent of New Brunswick residents are aware of the advocacy planning efforts of the local govern­ments in Middlesex County?
 
Study of 937 adults in NJ shows mean choles­terol level of 196 and standard deviation of 18 points. What can be concluded about the choles­terol level of all adults in NJ?

Estima­tion: Confidence Interval

Concept of the Level of Confid­ence:
Confidence level reflects probab­ility that interval of estimate presents actual value of population
Structure:
Confidence Interval is a method of infere­ntial statistics that helps us to use sample statistic to estimate population parameter. It is based on point of estimate and margin of error.
Confidence Interval =
Point of Estimate ± Margin of Error
Transl­ation of the Level of Confidence to z score
𝑧 score in confidence interval formula presents level of confid­ence, such that area under normal curve between −𝑧 and +z is equal to level of confidence
Requir­ements:
Sampling distri­bution is normal: ▪ Population distri­bution is normal ▪ Sample size is large (n>100)
 
Population standard deviation (σ) is known, Sample is taken randomly

Central Limit Theorem

Concept:
Set of ideas about the relati­onship between sample and population (sampling distri­bution and normality, and differ­ence)
 
Population mean equals to the mean of the sampling distri­bution
 
If population distri­bution is normal, sampling distri­bution is also normal.
 
Even if the population distri­bution is not normal, distri­bution of sample means approaches normal distri­bution as sample size is increased, and sampling distri­bution is almost normal if sample size is large (n >100).
Sampling Error:
is the difference between a sample statistics and corres­ponding population parameter.
 
Sampling error can not be determined
 
Sampling error most probably decreases as the sample size is increased.
Standard Error:
measures the accuracy with which a sample distri­bution represents a population by using standard deviation.
 
𝜎⁄√𝑛

Sample Size

Concept:
What is the minimum sample size required for study
Examples:
A researcher wants to collect data from a random sample of Rutgers University students in order to estimate commuting time of all the university students. How many students he must study?
 
What is the proper sample size if you want to check a planner’s claim that more than 75% of East Brunswick residents support increase of property tax to investment on develo­pment of renewable energy projects for the city?
 
How large a sample must be in order to be 95% sure about the outcome of our study?

Sample Size: Quanti­tative Data (Mean)

Sample Size: Qualit­ative Data (Propo­rtion)

//

Infere­ntial Statis­tics: Hypothesis Testing

Concept:
Making inference about population parameter based on a sample statistic
Example:
An officer at Rutgers Parking Authority collected data from a random sample of 220 Rutgers University students in order to check the idea that daily commuting time of Rutgers students is more than 45 minutes.
 
A researcher claims that less than one quarter of New Brunswick residents are aware of the advocacy planning efforts of the local govern­ments in Middlesex County? (Quali­tative)
 
Study of 937 adults in NJ shows mean choles­terol level of 196 and standard deviation of 18 points. Can we conclude that comparing to two year ago, the mean choles­terol level of NJ adults has increased?
(Confi­dence Interval)
Make an estimation about population parameter
(Hypot­hesis Testing)
Test validity of a claim about population parameter

Hypothesis Testing: Factors Affecting Decision

Sampling Error
Greater the gap between sample statistics and population parameter, greater is probab­ility of rejecting the null hypothesis
Sampling Size
Greater the sample size, greater is probab­ility of rejecting the null hypothesis
The Level of Signif­icance
Greater the level of signif­icance, greater is probab­ility of rejecting the null hypothesis

Hypothesis Testing: Steps

Hypothesis Testing: One Population

Concept: Research involves study of one population

Hypothesis Testing: Two Popula­tions

Concept:
Making inference about comparison of two population parameters based on two sample statis­tics, one from each population
Example:
Does gender makes a difference in duration of commuting time to work?
 
Is Tylenol more popular than Advil?
 
Does GPA vary between underg­raduate and graduate students?
 
A politician claims that when comparing democrats and republ­icans, a greater percentage of democrats support idea of dialogue among civili­zat­ions. Is this a valid claim?
Indepe­ndent vs Dependent Popula­tions
Indepe­ndent Popula­tions / Samples
The sample selected from one population is not related to the sample selected from the second popula­tion. // Changes within one population aren’t related to changes within another population // Samples are randomly selected from these popula­tions.
 
With indepe­ndent popula­tions, we directly consider the differ­ences in data points.
Example:
Comparing drug A with drug B // Comparing degree of spread of corona­virus in NJ vs CA
Dependen Popula­tions / Samples (Paired or Matched)
Each member of one sample corres­ponds to a member of the other sample. // Data Pairs occur naturally, most often with one data point occurring “before” and another data point occurring “after” an event.
 
With dependent popula­tions, we pair the data points then consider the differ­ences in data points.
Example:
Blood pressure of students before and after exam

Hypothesis Testing: Qualit­ative Data

Concept: Making inference about population parameter based on a sample statistic for qualit­ative data (nominal and ordinal - propor­tion)
Ex: Are proportion of households headed by single parent in the lower-­income neighb­orhoods signif­icantly different from the general popula­tion?
Ex: Are the police arrest rates in Middlesex county signif­icantly less than the statewide rate?

Indepe­ndent Popula­tions: Qualit­ative Data (Z)

Requir­ements for qualit­ative data (Propo­rtion, z): Indepe­ndent popula­tions
▪ One sample is taken randomly from each population
▪ To use a normal distri­bution, for each popula­tion:
▪ Sample size is large (n > 100)

Indepe­ndent Popula­tions: Quanti­tative Data (Z)

Requir­ements: Indepe­ndent popula­tions
▪ One sample is taken randomly from each population
▪ Sampling distri­bution is normal for each popula­tion:
▪ Population distri­bution is normal
▪ Sample size is large
▪ Standard deviation of each population (σ) is known

Indepe­ndent Popula­tions: Quanti­tative Data (T)

Requir­ements for quanti­tative data (Mean, t) : Indepe­ndent popula­tions
▪ One sample is taken randomly from each population
▪ Sampling distri­bution is normal for each popula­tion:
▪ Population distri­bution is normal
▪ Sample size is small
▪ Standard deviation of two popula­tions (σ) is unknown

// (copy)

 

Confidence Interval- Quanti­tative (Z)

Quanti­tative (ie. Normal): Confidence interval for estimation of mean
▪ Ex: What percentage of Rutgers University students study more than 25 hours per week?

T Distri­bution

Confidence Interval: Quanti­tative Data (t)

T-Dist­rib­ution
used when estimating the mean of a normally distri­buted population in situations where the sample size is small, and/or the population standard deviation is unknown.
 
The shape of distri­bution depends on the degrees of freedom (df = n-1).
 
As the degrees of freedom increase, the t distri­bution approaches the standard normal distri­bution.

Confidence Interval: Qualit­ative Data (z)

Qualit­ative (ie.Bi­nom­ial): Confidence interval for estimation of proportion
▪ Ex: A researcher claims that at least 72% of Americans support Green Solutions for sustai­nable urban renewal.

// (copy) (copy)

Analysis of Variance (ANOVA)

Applic­ations:
Mean differ­ences across several popula­tions
Potent­ials:
Works for comparison of three or more popula­tions
Population with signif­icant variation
Post Hoc Test:
The altern­ative hypothesis is not specific – it only states that at least one of the population means differs from the others. To find which category is different and how much, use post hoc (or after the fact) techni­ques.
Series of two Popula­tions t Test:
To find the difference or critical group, you can use several two popula­tions population analysis as needed (pick the important one, and then do the important one with another, and so on.
Different Types of ANOVA
One-way ANOVA:
One-way ANOVA is used to test effects of a single indepe­ndent variable (categ­orical) on a single dependent variable (numer­ical).
Example:
Does income vary among Africa­n-A­mer­icans, Whites, Hispanic, and Other ethnic groups? This example includes only ONE dependent variable (income) and ONE indepe­ndent variable (ethni­city).
Two-way ANOVA:
Two-way ANOVA is used to test effects of TWO indepe­ndent variables on a single dependent variable
Example:
Does income vary in relation to ethnicity and person­ality?
Factorial ANOVA:
Factorial ANOVA is used to test effects of SEVERAL (two or more) indepe­ndent variables on a single dependent variable.
ANOVA is approp­riate for situations in which? we are comparing more than two popula­tions

ANOVA

Hypothesis Testing: Analysis of Variance (ANOVA)

Concept:
Test of difference in the mean of several popula­tions (Analysis of Variance)
 
Think of ANOVA as extension of t test for more than two popula­tions
Example:
Is there a difference among Protes­tants, Catholics and Jews in terms of number of children?
 
How do Republ­icans, Democrats, and Indepe­ndents vary in terms of income?
 
How do older, middle­-aged, and younger people vary in terms of superm­arket shopping duration?

// (copy)

 

Hypothesis Testing: Chi Square

Chi Square Formula

Chi Square: Test of Indepe­ndence

Concept:
Test the relati­onship between 2 categories through their sub-ca­teg­ories
Example:
We want to know if gender and type of food people prefer are associ­ated.
 
We want to know if education level and political affili­ation are associ­ated.
 
Is type of sport students like associated with their ethnic backgr­ound?
Applic­ation:
A Chi-Square ( 𝜒2) Test of Indepe­ndence is used to determine existence of a signif­icant associ­ation between two catego­rical variables.
 
 
Is type of sport students like associated with their ethnic backgr­ound?

Chi Square: Test of Goodne­ss-­of-Fit

Concept:
Test validity of a particular distri­bution of population subcat­egories (how well sample data fit a distri­bution from a population with a normal distri­but­ion).
Example
A professor claims that in metrop­olitan areas, 19% of college students are Black, 42% are White, 16% are Hispanic, and the rest have a different ethnic backgr­ound. Is this a valid claim?
 
Isthis­val­idt­oas­sum­eth­atl­oca­lgy­msh­ave­the­irh­ighest attendance on Mondays, Tuesdays and Saturdays, average attendance on Wednesdays and Thursdays, and lowest attendance on Fridays and Sundays.
 
A researcher claims that season makes no difference in number of gun violations in large urban centers.
Applic­ations
Study pattern of distri­bution of subgroups

// (copy)

// (copy)