Show Menu
Cheatography

PY2103 Statistics Cheat Sheet (DRAFT) by

This is a Cheat Sheet for PY2103 Statistics- I tried to include as much information necessary from both the Lecture Slides and the Tutorials

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Termin­ology

Effect Size
Magnit­ude­/St­rength of Relati­onship
Statis­tical Signif­icance
measures the probab­ility of the null hypothesis being true compared to the acceptable level of uncert­ainty regarding the true answer. Basically, how likely the finding is attrib­utable to a specific cause and not to chance.
Standard Error
St E = Standard deviation of sampling distri­bution. Indicates how different a population mean is likely to be from a sample mean
Sampling Error
When the sample does not represent the entire population of data
Confidence Interval
Shows us the probab­ility that a parameter will fall between a pair of values around the mean- basically, shows you the range of values your estimate may fall between if you redo your test within a certain level of confidence
Z score
Is the difference between that indivi­dual’s score and the mean of the distri­bution, divided by the standard deviation of the distri­bution. It represents the number of standard deviations the score is from the mean.
Statis­tical power
In research design, it means the probab­ility of rejecting the null hypothesis given the sample size and expected relati­onship strength.
Alpha value
Represents the probab­ility of obtaining your results due to chance. Calculate it by taking 1 - C1 %
Probab­ility value (p value)
likelihood of the observed value of a statistic, if the H0 were true.

Ethics

Ethics
an evolving set of guidelines to assist the researcher in conducting ethical research.
3 areas of research ethics
Relati­onship between society and science, Profes­sional issues, Treatment of research partic­ipants
Relati­onship between society and science 8 Diener, E., & Crandall, R. (1978). Ethics in social and behavioral research. University of Chicago Press.
About the extent to which societal concerns and cultural values should direct the course of scientific invest­igation (e.g., government funding, corporate support)
Profes­sional issues
Research misconduct = fabric­ating, falsif­ying, or plagia­rizing the proposing, perfor­ming, reviewing, or reporting of research results
Treatment of research partic­ipants
Fundam­ental issue = treatment of and care for partic­ipants
APA Code of Conduct
Benefi­cence & non-ma­lef­icence, Fidelity & respon­sib­ility, Integrity, Justice, Respect for people's rights and dignity
Benefi­cence and nonmal­efi­cience
Benefi­cence = Acting for the benefit of others Nonmal­efi­cence = Do no harm to others Minimise the risks + maximise the benefits of research
Fidelity and respon­sib­ility
Refers to how we interact with others – We need to establish a trusting relati­onship with research partic­ipants. Issues of informed consen­t/c­onf­ide­nti­ali­ty/­dec­eption
Integrity
We should strive to be honest, accurate, and truthful in all profes­sional activities • Poorly conducted research is unethical. Findings should be reported honestly and dissem­inated widely
Justice
The benefits and burdens of research should be distri­buted as fairly as possible. E.g, who receives benefits of new treatment
Respect for people’s rights and dignity 21
Respect for the rights and dignity of people. Respect for their autonomy. E.g., right to withdraw, coercion
APA ethics section 8
Instit­utional approval, Informed Consent, Deception, Debrie­fing, Coerci­on/­Right to withdraw, Confid­ent­ial­ity­/An­ony­mit­y/P­rivacy
Instit­utional Approval
Instit­utions with active research programs require research to be reviewed by an IRB/HREC
Informed consent
All aspects of research must be disclosed and must be compre­hen­sible to partic­ipants 2. Partic­ipation should be voluntary, free from coercion; partic­ipants must be able to make rational judgement
Informed consent cont
Active versus Passive consent Active = verbally agreeing and signing a form consenting to partic­ipate • With children, guardians return forms (failure to return = denying consent) Passive = Consent is indicated by a guardian not returning the form (failure to return = consent). Use active whenever possible.
Deception
Some types of research require deception. Active deception- delibe­rately misleading partic­ipants with false info. Passive deception- withho­lding info
Coercion, right to withdraw
Coercion = Feeling pressured to partic­ipate Right to withdraw = Partic­ipants must always feel free to decline partic­ipating and/or to stop partic­ipating at any time
Privacy, anonymity, confid­ent­iality
Privacy = contro­lling other people’s access to inform­ation about a person Anonymity = keeping the identity of a partic­ipant unknown Confid­ent­iality = not revealing inform­ation obtained from a partic­ipant to anyone outside of the research team
Ethics of Animal Research
Concern animal welfare (improving animals' lving conditions and reducing number of animals used in resear­ch)but NOT animal rights
Ethics of Animal Research Guidelines
Justif­ication of research, Personnel, Care and housing of animals, Acquis­ition of animals, Experi­mental proced­ures, field research, eduational use of animals
Ethical dilemmas
No determined formul­a/rule, decision is a subjective judgment
IRB
Instit­utional Review Board
Ethical issues during authorship commun­ication
Justice (who receives credit for research), Fidelity and scientifc Integrity (accurate and honest reporting)
Steps to adhere to ethical consid­era­tions
Making changes to your research design, prescr­eening to identify and eliminate high-risk partic­ipants, and providing partic­ipants with as much inform­ation as possible during informed consent and debrie­fing. You need to monitor partic­ipants’ reactions, be alert for potential violations of confid­ent­iality, and maintain scholarly integrity through the public­ation process.

Charac­ter­istics (C) / Assump­tions (A) of Research

Control (C)
Holding constant or elimin­ating extraneous variables to establish cause-­and­-effect relati­ons­hips.
Operat­ion­alism (C)
Defining scientific concepts by the specific operations used to measure them. This includes multiple operat­ion­alism, where constructs are repres­ented by multiple measures.
Replic­ation (C)
The reprod­uction of results from one study in additional studies to verify findings.
Uniformity or Regularity in Nature (A)
The assumption that there are consistent and lawful relati­onships in nature.
Reality in Nature (A)
The belief that the phenomena studied by scientists are real and observ­able.
Discov­era­bility (A)
The assumption that these regula­rities and realities can be discovered through scientific invest­iga­tion.

Research Approaches

Research Settings
Field Experi­ments, Laboratory Experi­ments, Internet Epxeri­ments
Field experi­ments (RS)
Artifi­ciliaty not a problem, but cannot control extraneous variables like in a lab
Laboratory experi­ments (RS)
Ability to control extran­ueous variables, but introduce artifi­ciality and poor ecological validity
Internet experi­ments (RS)
Easy access, large samples and low cost, but lack of experi­menter control, self-s­ele­ction, drop out and multiple partic­ipant submis­sions
Descri­ptive Research (T)
Observing, recording and describing behaviour
Relati­ona­l/P­red­ictive Research (T)
Describing and detect­ing­/pr­edi­cting relati­onships
Causal Research (T)
Describing behaviour, predicting relati­onships AND exploring cause-­and­-effect
Qualit­ative Research (A)
Non-nu­mer­ical, interp­retive approach. Assumes a dynamic, negotiated soccialy constt­ructed reality. Data is written or spoken words, observ­ationws of behaviour, pictorial or visual matter. Data analysis is thematic analysis with focus on subjec­tiv­e/p­ersonal meaning
Quanti­ative Research (A)
Numerical data. Though sophis­ticated non-ex­per­imental approaches attempt to identify causal relati­onships • Can help identify factor­s/r­ela­tio­nships to then form hypotheses to be tested with experi­mental research
Mixed Methods (A)
Mixes Quanti­tative and Qualit­ative Research for more complete account
Quanti­ative Experi­mental
Before making causal claim, three criteria: Co-var­iation (changes must be correl­ated), Temporal ordering (cause must precede effect), no Alternate Explan­ations
Betwee­n-s­ubjects design
Different partic­ipants exposed to each level of IV
Within­-su­bjects design
All partic­ipants exposed to all levels of the IV. Can mitigate confou­nding partic­ipant variables, which helps better establish cause-­and­-effect Best used with proper counte­rba­lan­cing. Also subject to carryover effects.
Ads/Disads of Experi­mental Research
Causal inference, ability to manipulate variables, control
Does not test effects of extraneous variables, artifi­cia­lity, inadequate method of scientific inquiry
Quanti­tative Non-ex­per­imental
No manipu­lation of the IV, descri­ptive research, identifies factor­s/r­ela­tio­nships to form hypotheses to then be tested through experi­mental
Types of Quan Non-Ex­per­imental
Correl­ational study, Natural manipu­lation, cross-­sec­tional and longit­udinal
Ads/Di­s-Ads of Each Type
Research objectives of descri­ption and predic­tion, Research objectives of descri­ption and predic­tion, Multiple Groups­/Time points to consider
Sometimes false assumption of causation, false assumption of causation, cross-­sec­tio­nal­/lo­ngi­tudinal do not always produce similar results
Streng­hts­/We­akn­esses of Qualit­ative Research
Many different data collection methods, good for descri­bin­g/u­nde­rst­anding, provides data to develop theory
Difficult to Genera­lise, varying interp­ret­ations, objective hypothesis testing procedures not always used
Direct­ion­al/­One­-tailed Hypothesis
Group A would have a higher mean on X than Group B. OR. There would be a positi­ve/­neg­ative relati­onship between X and Y.
Non-Di­rec­tional/ Two-tailed Hypothesis
Groups A and B would differ on X. OR there would be a relati­onship between X and Y.
Null Hypoth­esis.
A statement of no relati­onship among variables, or no differ­ences between condit­ions.
Content Validity
Ensures the test covers the full range of the concept being measured.
Construct Validity
Measures how well the test reflects the theore­tical concept it is designed to assess.
Criter­ion­-Re­lated Validity:
Evaluates how well the test predicts outcomes based on another measure.
Face Validity
Assesses whether the test appears to measure what it is supposed to measure based on subjective judgment
External Validity
Examines if the study’s results can be genera­lized to other settings, people, times, and measures.
Internal Validity
Ensures the study accurately measures the relati­onship between variables without interf­erence from other factors.
Outcome Validity
Refers to how well a test or measure predicts or correlates with an outcome or behavior that it is supposed to influence or relate to in the real world. It’s closely related to predictive validity but focuses on the practical implic­ations of the test’s results in real-world outcomes.
P-Value
The p-value is a measure of the probab­ility of obtaining test results at least as extreme as the results actually observed, assuming that the null hypothesis is true. It quantifies the likelihood that the observed data would occur if the null hypothesis were correct. The null hypothesis typically represents a statement of no effect or no differ­ence.
Experi­mental Research
First feature is that the resear­chers' manipu­lation of the indepe­ndent variable (condi­tions), and second feature is that the researcher exerts control over variables other than the IV and DV (extra­neous variables)
Statis­tical Validity
Concerns the proper statis­tical treatment of data and the souwndness of the resear­chers’ statis­tical conclu­sions
Non-ex­per­imental Research
Research that lacks the manipu­lation of an IV, but simply involves measuring variables as they naturally occur. Use when the research question relates to a single variable rather than a statis­tical relati­onship, or if it's a non-causal statis­tical relati­onship, or if the IV cannot be manipu­lated otherwise
Types of Non-Ex­per­imental Research
Correl­ational Research (measuring two variables with little/no control over extraneous variab­les), Observ­ational Research (focuses on making observ­ations of behaviour in natural or labs etting without manipu­lating anything
Counte­rba­lancing
Testing different partic­ipants in different orders. Best is Complete CB, but random CB can be used when the number of conditions in an experiment is large.
Four Main Types of Validity
are internal validity, external validity, statis­tical, construct
Concurrent validity
When the criterion is measured at the same time as the construct
Predictive validity
When the criterion is measured at some point in the future (after the construct has been measured)
Convergent validity
Criteria can also include other measures of the same construct
Reliab­ility
The consis­tency of a measure.
Three types of Consis­tency
Over time (test-­retest reliab­ility), across items (internal consis­tency), and across different resear­chers (inter rater reliab­ility).
Statis­tical signif­icance
Conclusion that an observed finding (e.g., a difference between groups or condit­ions) would be very unlikely if the null hypothesis were true. • Practical signif­icance = Clinical signif­icance = Claim made when a statis­tically signif­icant finding seem large enough to be important.

Relati­onship between Variables

Statis­tical Methods to assess if two things are related or not
Correl­ation, Chi-Square and Regression
Scatte­rplots
Used to examine relati­onship between 2 quanti­tative varaibles. X-axisi: IV. Y-axis: DV.
Pearson's r correl­ation
Measures degree and direction of linear relati­onship between quanti­tative variables. r=0 does NOT necess­arily indicate absence of relati­onship though. Also known as bivariate correl­ation, and is based on covariance between variables.
Covariance
How much each variable varies together
Homosc­eda­sticity
Error variance is assumed to be the same at all points along linear relati­onship
Contin­gency Table
Used to examine relati­onship between catego­rical variables
Pearon's r effect size classi­fic­ations
r = .10 Small effect • r = .30 Medium effect • r = ≥ .50 Large effect
Correl­ation coeffi­cient (Pearson’s r): Proportion of variance
To calculate the proportion of variance in one variable that can be accounted for by variance in the second, simply square Pearson’s r. r2 = .01 Small effect r2 = .09 Medium effect r2 ≥ .25 Large effect

Descri­ptive Statistics

Descri­ptive Statistics
Includes Frequency Distri­bution, Graphic Repres­ent­ations, Central Tendency and Variab­ility
Frequency Distri­butions
Data arrang­ement where we show the freque­ncies of each unique data value
Graphic Repres­ent­ations
Bar Graphs, Histog­rams, Line Graphs, Scatte­rplots
Bar Graphs
Vertical bars used to depict freque­ncies of catego­rical indepe­ndent variable (eg both groups)
Histograms
Used to depict freque­ncies and distri­bution of quanti­ative variable. X-axis is the quanti­ative variable, y-axis is frequency
Line graph
Showing trend of connecting quanti­ative data. X-axis quanti­ative, y-axis freque­ncies .
Line graph can also be used to show intera­ction effects (e.g., pre-test post-test data)
x-axis is a catego­rical variable, y-axis is freque­ncies of each variable
Scatte­rplot
Graphical Depiction of relati­onship between 2 quanti­tative variables. X-axis IV, Y-axis DV.
Central Tendency
Tells us what is typical for a quanti­ative variable through mean, median and mode
Variab­ility
Tells us how spread out values of a quanti­ative variable are
Mode
Used best as a repres­ent­ation when data is normally distri­buted (e.g., symmet­rical around mean)
Median
Center point of an ordered set of numbers
Mean
Arithmetic average
3 types of Variab­ility
Range, Variance, Standard Deviation
Range
Highest data score minus lowest data score
Variance and Standard Deviation
Superior to range because they take into account ALL of the data values and provide info about disper­sion. Variance = the average deviation of data values from their mean in squared units Standard deviation = the square root of the variance

Six Data Collection Methods

Observ­ations
Researcher watches and records events­/be­hav­iours. Natura­listic or Laboratory Observ­ations
Provides firsthand inform­ation, allows for study of natural behaviour, captures non-verbal cues, usually explor­ato­ry/­ope­n-ended
Reactive effect if repson­dents know they are being observed, invest­igator effects (personal bias), data analysis is time-c­ons­uming
Questi­onn­aires
Measures partic­ipants' opinions and provides self-r­eported demogr­aphic info. Closed­-ended or open-ended questi­onn­aires
Efficient for large sample, standa­rdised format for easy comparison
Response bias, limited depth of info, potential for misint­erp­ret­ation
Existing Data
Collection of data that was left behind­/used for something different before the current research. Documents, physical data, etc.
cost-e­ffe­ctive, time-s­aving, allows for longit­udinal studies
data may be incomp­let­e/o­utd­ated, lack of control over data collection methods
Interview
Can be through multiple mediums (face-­to-­face, phone, etc). Can be synchr­onous (happens in real-time) or asynch­ronous (over-­time)
Good for measuring attitudes, allows for probing, in-depth info, useful for hypothesis testing
People might not recall important info, reactive effects, invest­igator effects, expensive and time-c­onu­sming
Focus Groups
Collection of data in a group situation where moderator leads discussion with a small group
Useful for exploring ideas and concepts, provides window into internal thinking, in-depth info, can be taped
Can be ex, difficult to find good moderator, reactive and invest­igator effects, measur­ement validity low
Tests
Data collection instru­ments designed to measure something. Standa­rdised (existing, tested in previous research) or Resear­che­r-c­ons­tructed (new, often specif­ically developed to test for variables)
Provides measures of many charac­ter­istics, usually alr developed, availa­bility of data to reference, easy data analysis
Can be ex, reactive partic­ipant effects, might not be approp­riate for certain samples, open-ended Qs not avail

Infere­ntial Statistics

Infere­ntial Statistics
allows resear­chers to make genera­liz­ations about a population based on sample data. It helps in estimating population parameters and testing hypoth­eses.
Sampling Error
the difference between the sample statistic and the actual population parameter. It is a natural occurrence in sampling and is important to understand for accurate data interp­ret­ation.
Sampling Distri­butions
These are probab­ility distri­hutions that can be constr­ucted for any sample statistic
Estimation
This involves using sample data to estimate population parame­ters. There are two types:
Point Estimation
Provides a single value estimate of a population parameter.
Interval Estimation
Provides a range (confi­dence interval) within which the parameter is expected to lie.
Confidence Intervals
A confidence interval gives a range of values that is likely to contain the population parameter with a certain level of confidence (e.g., 95%).
Null Hypothesis Signif­icance Testing (NHST)
method for testing a hypothesis by determ­ining the probab­ility of observing the sample data if the null hypothesis is true. It involves setting a signif­icance level (alpha) to decide whether to reject the null hypoth­esis.
Type I Error
Occurs when the null hypothesis is incorr­ectly rejected (false positive).
Type II Error:
Occurs when the null hypothesis is not rejected when it is false (false negative).

Sampling

Sampliing
Can be qualit­ative or quanti­tative
Statistics
a numerical charac­ter­istic of sample data
Parameter
a numerical charac­ter­istic of a population
Sampling error
differ­ences between sample values and the true population parame­ters. There’s always some degree of sampling error. If you need 0 error = you can’t sample = conduct a census = collecting data from everyone in the popula­tion)
Sampling frame
a list of all the elements in a population
Response rate
the percentage of indivi­duals selected to be in the sample who actually partic­ipate in the study
Quanti­tative Sampling
Can be Random or Non-Random
Random Sampling
When your goal is to generalize findings to a larger population and you need to minimize bias.Using a random process to select members of the population for inclusion in the sample.All members of the population have an equal chance of inclusion in the sample. Can only be used if we can identify every member of the popula­tio­n.C­losely tied to the external validity of research and reduces bias while increasing genera­lis­ability and statis­tical validity. This is repres­ent­ative.
Non-Random Sampling
When studying specific subgroups, when resources are limited, or when the research requires depth over breadt­hSe­lecting partic­ipants for inclusion in a sample nonran­dom­ly.All members of the population DO NOT have an equal chance of being included in the sample. It is cost-e­ffe­ctive and efficient and allows for targetted sampling of a subgroup without a popula­tion. This is biased.
Random Sampling Types
Simple Sampling, Systematic Sampling, Stratified Sampling, Cluster Sampling
Non-Random Sampling Types
Opport­uni­ty/­Con­ven­ience, Quota, Purposive, Snowball
Simple Sampling, Systematic Sampling, Stratified Sampling, Cluster Sampling
Pure mathem­atical sampling, starting at a random point and then selecting every Nth case, random sampling from homogenous strata of popula­tion, random sampling from XX randomly selected clusters
Opport­uni­ty/­Con­ven­ience, Quota, Purposive, Snowball
Selecting indivi­duals based on availa­bility, seeking out a specific numerical number of cases in predet­ermined categories using non-random methods, using a range of methods to obtain partic­ipants with specific charac­ter­istics, identi­fying further cases from elements already in sample
EPSEM
Equal probab­ility selection method (EPSEM). choosing a sample in a manner in which everyone has an equal chance of being selected
Propor­tional stratified sampling
where the sample propor­tions are made to be the same as the population propor­tions. IS an EPSEM
Dispro­por­tional stratified sampling
where the sample propor­tions are made to be different from the population propor­tions. NOt an EPSEM
One-stage cluster sampling
randomly select clusters and using all indivi­duals within. E.g., randomly select 15 psychology classrooms using all indivi­duals in each classroom
Two-stage cluster sampling
Qualit­ative Sampling
Usually purposive, can include: Maximum variation sampling, Extreme Case Sampling, Homoge­neous Sample Selection, Typica­l-case Sampling, Critic­al-case Sampling, Negati­ve-case Sampling, Opport­unistic Sampling
Random Assign­ment:
Using a random process to allocate units/­ele­ments of the sample to levels of an indepe­ndent variab­le.C­losely tied to the internal validity of research. Primary use is that it Addresses group-­non­equ­iva­lence