Show Menu
Cheatography

Descriptive Statistics Cheat Sheet (DRAFT) by

Concept of Statistics

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Descri­ptive Statistics Pathway

Types of Statis­tical Methods

Descri­ptive Statistics
Methods that help us to describe sample or population Example: Standard deviation
Process
1- Identify population or sample 2- Identify variable of interest 3- Collect data 4- Describe data
Infere­ntial Statistics
Methods that help us to use sample inform­ation in order to draw conclu­sions regarding the population Example: Confidence Interval
Process
1- Identify population 2- Identify variable of interest 3- Collect sample data 4- Inference about population based on sample data (Make statements or predic­tions about the popula­tion)

Population /Sample

Ch1. Review Questions

Walter Wallace's wheel of science best illust­rates which of the following?
The relati­onship between theory and data
A charac­ter­istic of the individual to be measured or observed is:
Variable
You are studying relati­onship between activity level and body fat percentage of females in New Jersey. What is the indepe­ndent variable and what is the control variable of your study?
Activity level: Indepe­ndent variable. Gender­:Co­ntrol variable
Identify: 1) the level of measur­ement 2) Whether the variable is discrete or continuous
What is the main mode of transp­ort­ation you use to commute to work? 1. by foot 2. by bicycle 3. by private vehicle, including car, truck, van, taxicab, and motorcycle 4. by public transp­ort­ation, including bus, rail, and ferry
1)Level of Measur­ement: Nominal. 2)Neither Continuous or Discrete.
People convicted of first-­degree murder should be executed. Strongly Agree Agree Neither Agree nor Disagree Disagree Strongly Disagree
1)Level of Measur­eme­nt:­Ord­inal. 2)Neither Continuous or Discrete.

Methods

Data Collection
Data Organi­zation
Data Presen­tation
Data Analysis and Interp­ret­ation

Data Collection Critical Issues

SOP / MOP
A Standard Operating Procedure or Manual Operating Procedure is a set of step-b­y-step instru­ctions compiled by a research team to guide indivi­duals carry out the research project operat­ions.
Blind Group
Single Blind: Partic­ipant is blind to the collected data. Double Blind: Partic­ipant and researcher are blind to the collected data. Triple Blind: Partic­ipant, resear­cher, and project manager are blind to the collected data.
Placebo
A Substance that's designed to have no therap­eutic value.
QA / QC
Quality Assurance is process orientated and focuses on problem preven­tion. QAs are set of approa­ches, methods, processes and actions formulated to assure quality of the project and its focus is on problem identi­fic­ation and correc­tion.

Shape of Data Distri­bution

Variation Analysis: Coeffi­cient of Skewness

Coeffi­cient of Skewness

Concept:
What is shape of data distri­bution using numerical form
Shows
direction of skewness (sign). Strength of skewness (value). Comparison of several distri­butions

Variation Analysis: Coeffi­cient of Variation

Coeffi­cient of Variation

Concept:
What is shape of data distri­bution using numerical form
Potentials
Shows degree of skewness within the data set. Easy to compare several distri­butions

Chebyshev Theorem

Concept: How much of data is located close to mean and how much is far from the mean.

Chebyshev Theorem Example

Variation Analysis: Z Score

Concept: How far one data is from the mean, in units of standard deviation
Positive Z indicates that the data is above average, Negative Z indicates that it is below average
If data distri­bution is normal, then Z is Standard Z Score
 

Quanti­tative Vs Qualit­ative Methods

Quanti­tative Research
Test Theori­es/­hyp­othesis
Seeks to Measur­e/Q­uantify
Method­ology
o Numerical Values
o Closed Questions
Required Data
Requires Many Respon­dents
Outcomes
o May be genera­lized to the popula­tion.
o Guidance for decision making.
Key Terms
Testing, Measur­ement, Object­ivity, replic­ability

Quanti­tative Vs Qualit­ative Methods

Qualit­ative Research
Seeks to explain and unders­tand.
Exploring ideas and formul­ating theory and hypothesis
Method­ology
o Text/N­arr­atives
o Open-ended Question
Required Data
Requires Few respon­dents.
Outcome
o Not projec­table to total popula­tions
o Unders­tan­dings for new ideas
Key Terms
Unders­tan­ding, context, comple­xity, subjec­tivity

Popula­tion/ Sample

Population
all indivi­duals of objects under study.
Sample
a portion of popula­tion.
Indivi­dua­l/Unit of Analysis
an indivudals or object under study .
Parameter
a numerical measure that describes an aspect of a popula­tion.
Statistic
a numerical mesure that describes an aspect of a sample.
Sampling Frame
a list of indivi­duals from which a sample is taken.
Under-­cov­erage
the outcome of omitting some indivi­duals from the sample frame.
Random Sample
a sample selected in a manner that every individual has equal chance to be selected as a member of the sample.
Sampling Error
the difference between a sample statistics and corres­ponding population parameter.Sampling error can not be determ­ine­d.S­ampling error most probably decreases as the sample size is increased.
Non-sa­mpling Error
Mainly the result of: Poor sample design, proble­matic data collec­tion, biases involved in data collection

Data Organi­zation: Tabular Methods

Data Organi­zation: Frequency Distri­bution

Applic­ations
Distri­bution of data within catego­ries.
Guidelines
Must be mutually exclusive (Non-O­verlap classes) and Exhaustive (Classes for every observ­ation) .
If possible:
5-10 classes. Equal class width (Size, Range, Interval). Avoid open-ended classes (More than $100,000). List all classes including those which have zero frequency.

Data Presen­tation

Types:
Static, Intera­ctive, Animated
Nature of Data
Univariate Charts
Line Chart, Bar Chart, Pie Chart, Histogram, Bi-His­togram, Polygon, Box-Plot, Q-Q Plot, P-P Plot
Bivariate Charts
Scatter Plot
Multiv­ariate Charts
Star Chart, Radar Chart, Web Chart, Area Chart

Variation Analysis: Box Plot

Variation Analysis: Range

Range
Gap between 2 data Gap between minimum and maximum
 
Range = R= Maximum - Minimum
Mid-Range
Half of maximum and minimum data together
 
Mid-range = MR = Maximum + Minimum/2
Inter-­Qua­rtile Range
IQR=Q3-Q1
Inter-­Qua­rtile Deviation
IQD = QD = (Q3-Q1) / 2

Index of Qualit­ative Variation

Concept:
The index of qualit­ative variation (IQV) is a measure of variab­ility for nominal data.
IQV =
Number of observed differ­ences / Maximum possible differ­ences

Empirical Rule (Normal)

Concept: If data is normally distri­buted, what percen­tages of data is located close to mean and how much is far from the mean.
About 68% of the data fall in the interval of ΞΌ - 1 to ΞΌ + 1
About 95% of the data fall in the interval of ΞΌ - 2 to ΞΌ + 2
About 99% of the data fall in the interval of ΞΌ - 3 to ΞΌ + 3
Standard deviation is about one forth of the range (R/4)

Classical Probab­ility

Ex: When we roll a fair die, what is probab­ility of having 2?
Probab­ility of 2 = P(2) = 1/6 = 16.6%
 

Types of Data

Data & Variable

Data
Facts and values of variables collected for analys­is/­pre­sen­tation
Qualit­ative. Data
Values have qualit­ative format. Example: Mode of transp­ort­ation
 
Quanti­tative. Data
Values of a meaningful quanti­tative format. Example: Age of Customers

Data & Variable

Variable
any charac­ter­istics of an individual that can change from individual to indivi­dual.
Variable Types
1. Indepe­ndent Variable (expla­natory)
the variable that you purposely change or control in order to see what effect it has.
Variable that is changed or controlled in a study / scientific experiment
2. Dependent Variable (response)
the variable that responds to the change in the indepe­ndent variable.
3. Lurking Variable
the variable that can have effect on the variables being studied but is not included in the study.
4. Control Variable
the variable that you keep it controlled in order to reduce effects on known and unknown variables.
5. Confounded Variable
a variable that influences both the dependent variable and indepe­ndent variable.

Variable Type Example

Example A researcher wants to assess the effect­iveness of drug X on recovery duration.
Indepe­ndent Variable (expla­nat­ory):
Amount of drug X intake (mg)
Dependent Variable (respo­nse):
Recovery duration (days)
Lurking Variables:
Gender, Age, Weight, Ethnicity, Other illnesses, etc
Control Variable:
Researcher will collect data from 25-40 years old patients to keep age constant.
Confounded Variable:
Other drugs the patient is taking

Frequency Distri­bution: Relative Analysis

Proportion
πΉπ‘Ÿπ‘’π‘žπ‘’π‘’­π‘›π‘π‘¦­/(𝑆­π‘Žπ‘šπ‘π‘™π‘’ 𝑠𝑖𝑧𝑒)
Percentage
πΉπ‘Ÿπ‘’π‘žπ‘’π‘’­π‘›π‘π‘¦­/(𝑆­π‘Žπ‘šπ‘π‘™π‘’ 𝑠𝑖𝑧𝑒)βˆ—100
Percent Change
(𝐹2 βˆ’πΉ1)/𝐹­1βˆ—100
 
F1 is the earlier frequency. F2 is the later frequency. % change can be positive or negative
Ratio
𝐹1/𝐹2
 
F1 is the number of cases in 1st category. F2 is the number of cases in 2nd category.
Rate
(# π‘œπ‘“ π‘Žπ‘π‘‘π‘’π‘Žπ‘™ π‘œπ‘π‘π‘’π‘Ÿπ‘Ž­π‘›π‘π‘’)/(# π‘œπ‘“ π‘‘π‘œπ‘‘π‘Žπ‘™ π‘π‘œπ‘ π‘ π‘–π‘­π‘–𝑙𝑖­π‘‘𝑖𝑒𝑠)
 
122 death / 7000 population = 0.0174. Crude Death Rate = 0.0174 * 1000. Crude Death Rate = 17.4 per 1000.
Density
πΉπ‘Ÿπ‘’π‘žπ‘’π‘’­π‘›π‘π‘¦­/(πΆπ‘™π‘Žπ‘ π‘  π‘Šπ‘–π‘‘π‘‘β„Ž)
 
Density is the only measure that requires quanti­tative data.

Data Category/ Level of Measur­eme­nt/­Nature of Data

Qualit­ative (Categ­orical)
Codes: Religious backgr­ound, Political Affili­ation
Nominal :
Objects fall into unordered catego­ries. Example: Race, Religious background
Ordinal:
Objects fall into naturally ordered catego­ries. Example: Perfor­mance, Economic class
Binary
Non-Binary
Objects fall into sevral caterg­ories. Example: Level of community partic­ipation
Quanti­tative (Numer­ical)
Numbers: Income, Family Size

Mean

Median­/Qu­artiles

Position of Median in Data Set
(𝑛+1)/2
Quartiles
Position of Q1: (𝑛+1)* 0.25
 
Position of Q3: (𝑛+1)* 0.75

Mean Deviation & Standard Deviation

Variation Analysis: Percen­tile, Decile, Quartile

Concept: where one data stands in comparison to the rest of data within the data set
Percen­tile: Percent of data which is equal or less than a given data
Decile: Which decile a data falls in
Quartile: Which quartile a data falls in