Cheatography

Statistics in Behavioral Sciences Cheat Sheet by Sana_H

Statistics in Behavioral Sciences: parametric and non-parametric tests

Stat­ist­ics

 the branch of mathem­atics in which data are used descri­ptively or infere­ntially to find or support answers for scientific and other quanti­fiable questions. It encomp­asses various techniques and procedures for recording, organi­zing, analyzing, and reporting quanti­tative inform­ation.

Difference - parametric test & non-pa­ram­etric test

 PROPERTIES PARAMETRIC NON-PA­RAM­ETRIC assump­tions YES NO value for central tendency mean median­/mode probab­ility distri­bution normally distri­buted user specific population knowledge required not required used for interval data nominal, ordinal data correl­ation pearson spearman tests t test, z test, f test, ANOVA Kruskal Wallis H test, Mann-w­hitney U, Chi-square

Correl­ation Coeffi­cient

 a statis­tical measure of the strength of the relati­onship between the relative movements of two variables value ranges from -1 to +1 -1 = perfect negative or inverse correl­ation +1 = perfect positive correl­ation or direct relati­onship 0 = no linear relati­onship

Altern­atives

 PARAMETRIC NON-PA­RAM­ETRIC one sample z test, one sample t test one sample sign test one sample z test, one sample t test one sample Wilcoxon signed rank test two way ANOVA Friedman test one way ANOVA Kruskal wallis test indepe­ndent sample t test mann-w­hitney U test one way ANOVA mood's median test pearson correl­ation spearman correl­ation

Paired t-test

 to compare means of two related groups ex. compare weight of 20 mice before and after treatment two condit­ions: - pre post treatment - two diff conditions ex two drugs ASSUMP­TIONS - random selection - normally distri­buted - no extreme outliers FORMULA t= m / s/√n m= sample mean of differ­ences df= n-1

t-dist­rib­ution

 aka Student's t-dist­rib­ution = probab­ility distri­bution similar to normal distri­bution but has heavier tails used to estimate pop parameters for small samples Tail heaviness is determined by degrees of freedom = gives lower probab­ility to centre, higher to tails than normal distri­bution, also have higher kurtosis, symmet­rical, unimodal, centred at 0, larger spread around 0 df = n - 1 above 30df, use z-dist­rib­ution t-score = no of SD from mean in a t-dist­rib­ution we find: - upper and lower boundaries - p value TO BE USED WHEN: - small sample - SD is unknown ASSUMP­TIONS - cont or ordinal scale - random selection - NPC - equal SD for indep two-sample t-test

Two-sample z-test

 to determine if means of two indepe­ndent popula­tions are equal or different to find out if there is signif­icant diff bet two pop by comparing sample mean knowledge of: SD and sample >30 in each group eg. compare perfor­mance of 2 students, average salaries, employee perfor­mance, compare IQ, etc FORMULA: z= x̄₁ - x̄₂ / √s₁2/n₁ + s₂2/n₂ s= SD formula: z= (x̄₁ - x̄₂) - (µ₁ - µ₂) / √σ₁2/n₁ + σ₂2/n₂ (µ₁ - µ₂) = hypoth­esized difference bet pop means

Point Biserial correl­ation

 measures relati­onship between two variables rpbi = correl­ation coeffi­cient one continuous variable (ratio­/in­terval scale) one naturally binary variable FORMULA: rpb= M1-M0/Sn * √ pq Sn= SD

Two-sample z-test

 to determine if means of two indepe­ndent popula­tions are equal or different to find out if there is signif­icant diff bet two pop by comparing sample mean knowledge of: SD and sample >30 in each group eg. compare perfor­mance of 2 students, average salaries, employee perfor­mance, compare IQ, etc FORMULA: z= x̄

z-test

 for hypothesis testing to check whether means of two popula­tions are equal to each other when pop variance is known we have knowledge of: - SD/pop­ulation variance and/or sample n=30 or more ``if both unknown -> t-test`` left-t­ailed right-­tailed two-tailed REJECT NULL HYPOTHESIS IF Z STATISTIC IS STATIS­TICALLY SIGNIF­ICANT WHEN COMPARED WITH CRITICAL VALUE z-stat­istic/ z-score = no repres­enting result from z-test z critical value divides graph into acceptance and rejection regions if z stat falls in rejection region­-> H0 can be rejected TYPES One-sample z-test Two-sample z-test

ANOVA

 Analysis of Variance comparing several sets of scores to test if means of 3 or more groups are equal comparison of variance between and within groups to check if sample groups are affected by same factors and to same degree compare differ­ences in means and variance of distri­bution ONE-WAY ANOVA=no of IVs single IV with different (2) levels­/va­ria­tions have measurable effect on DV compare means of 2 or more indep groups aka: - one-factor ANOVA - one-way analysis of variance - between subjects ANOVA Assump­tions - indepe­ndent samples - equal sample sizes in groups­/levels - normally distri­buted - equal variance F test is used to check statis­tical signif­icance higher F value --> higher likelihood that difference observed is real and not due to chance used in field studies, experi­ments, quasi-exp CONDIT­IONS: - min 6 subjects - sample no of samples in each group H0: µ1=µ2=µ3 ... µk i.e. all pop means are equal Ha: at least one µi is different i.e atleat one of the k pop means is not equal to the others µi is the pop mean of group

Spearman Correl­ation

 non-pa­ram­etric version of Pearson correl­ation coeffi­cient named after Charles Spearman denoted by ρ(rho) determine the strength and direction of monotonic variables bet two variables measured at ordinal, interval or ratio levels & whether they are correlated or not monotonic function=one variable never increases or never decreases as its IV changes - monoto­nically increa­sing= as X increases, Y never decreases - monoto­nically decrea­sing= as X increases, Y never increases - not monotonic= as X increases, Y sometimes dec and sometimes inc for analysis with: ordinal data, continuous data uses ranks instead of assump­tions of normality aka Spearman Rank order test FORMULA: ρ= 1- 6Σdᵢ2/n(n2-1) di= difference between two ranks of each observ­ation -1 to +1 +1 = perfect associ­ation of ranks 0= no associ­ation -1= perfect negative associ­ation of ranks closer the value to 0, weaker the associ­ation Value Ranges 0 to 0.3 = weak monotonic relati­onship 0.4 to 0.6 = moderate strength monotonic relati­onship 0.7 to 1 = strong monotonic relati­onship

Parametric and Non-pa­ram­etric test

 Fixed set of parame­ters, certain assump­tions about distri­bution of population PARAMETRIC - prior knowledge of pop distri­bution i.e NORMAL DISTRI­BUTION NON-PA­RAM­ETRIC - no assump­tions, do not depend on popula­tion, DISTRI­BUTION FREE tests, values found on nominal or ordinal level easy to apply, unders­tand, low complexity decision based on - distri­bution of popula­tion, size of sample parametric - mean & <30 sample non-pa­ram­etric - median­/mode & >30 sample or regardless of size

 ADVANTAGES DISADV­ANTAGES simple, easy to understand less powerful than parame­trics no assump­tions counte­rpart parametric if exists, is more powerful more versatile not as efficient as parametric tests easier to calculate may waste inform­ation hypothesis tested may be more accurate requires larger sample to be as powerful as parametric test small sample sizes are okay difficult to compute large samples by hand can be used for all types of data (nominal, ordinal, interval) tabular format of data required that may not be readily available can be used with data having outliers

Applic­ation

 PARAMETRIC TESTS NON-PA­RAM­ETRIC TESTS - quanti­tative & continuous data - mixed data - normally distri­buted - unknown distri­bution of population - data is estimated on ratio or interval scales - different kinds of measur­ement scales

degrees of freedom

 indepe­ndent values in the data sample that have freedom to vary FORMULA: no of values in a data set minus 1 df= N-1

t-test

 statis­tical test to determine if signif­icant difference between avg scores of two groups 1908-W­illiam Sealy Gosset- student t-test and t-dist­irb­ution for hypothesis testing knowledge of: distri­bution - normally distri­buted no knowledge of SD TYPES: one-sample t-test - single group FORMULA: t= m - µ / s/√n SD FORMULA: σ= √Σ(X-µ)2 / N s= √Σ(X-µ)2 / n-1 indepe­ndent two-sample t-test - two groups paired­/de­pendent samples t-test - sig diff in paired measur­ements, compares means from same group at diff times (test-­retest sample) H0: no effective difference = measured diff is due to chance Ha: two-ta­iled/ one-tailed nonequ­ivalent means/­smaller or larger than hypoth­esized mean PERFORM two-tailed test: to find out difference bet two popula­tions one-tailed: one pop mean is > or < other

Indepe­ndent two-sample t-test

 aka unpaired t-test to compare mean of two indepe­ndent groups ex. avg weight of males and females two forms: - student's t-test: assumes SD is equal - welch's t-test: less restri­ctive, no assumption of equal SD both provide more/less similar results ASSUMP­TIONS: - normally distri­buted - SD is same - indepe­ndent groups - randomly selected - indepe­ndent observ­ations - measured on interval or ratio scale FORMULA: t= x̄₁ - x̄₂ / √s₁2/n₁ + s₂2/n₂ df= n1 + n2 - 2 S= √Σ (x1-x̄)2 + (x2-x̄)2 / n1+n2-2

One-sample z-test

 to check if difference between sample mean & population mean when SD is known FORMULA: z=x-µ/SE SE=σ/√n z score is compared to a z table (includes % under NPC bet mean and z score), tells us whether the z score is due to chance or not condit­ions: knowledge of: - pop mean - SD - simple random sample - normal distri­bution two approaches to reject H0: - p-value approach - p-value is the smallest level of signif­icance at which H0 can be reject­ed...smaller p-value, stronger evidence -critical value approach - comparing z stat to critical values... indicate boundary regions where stat is highly improbable to lie= critical region­s/r­eje­ction regions if z stat is in critical region­-> reject H0 based on: signif­icance level (0.1, 0.05, 0.01), alpha level, Ha

Biserial correl­ation

 to measure relati­onship between quanti­tative variables and binary variables given by Pearson - 1909 biserial correl­ation coeff varies bet -1 and 1 0= no associ­ation ex. IQ scores and pass/fail correl­ation continuous variable and binary variable (dicho­tomised to create binary variable) rbis or rb = correl­ation index estimating strength of relati­onship between artifi­cially dichot­omous variable and a true continuous variable ASSUMP­TIONS: - data measured on continuous scale - one variable to be made dichot­omous - no outliers - approx normally distri­buted - equal variances (SD) FORMULA rb= M1-M0/SDt * pq/y M1=mean of grp 1 M2= mean of grp 2 p= ratio of grp 1 q= ratio of grp 2 SDt= total SD y= ordinate

Pearson Correl­ation

 measures strength and direction of a linear relati­onship between two variables how two data sets are correlated gives us info about the slope of the line r aka: - Pearson's r - bivariate correl­ation - Pearson produc­t-m­oment correl­ation coeffi­cient (PPMCC) cannot determine dependence of variables & cannot assess nonlinear associ­ations r value variation: -0.1 to -.03 / 0.1 to 0.3 = weak correl­ation -0.3 to -0.5 / 0.3 to 0.5 = averag­e/m­oderate correl­ation -0.5 to -1.0 / 0.5 to 1.0 = strong correl­ation FORMULA: r=n(Σx­y)-­(Σx­)(Σy) / √[nΣx2-(Σx)2] [nΣy2-(Σy)2]

Mann-W­hitney U test

 non-pa­ram­etric test to test the signif­icance of difference two indepe­ndently drawn groups OR compare outcomes between two indepe­ndent groups equi to unpaired t test CONDIT­IONS: No NPC assump­tion, small sample size >30 with min 5 in each group, continuous data (able to take any no in range), randomly selected samples, aka: Mann-W­hitney Test Wilcoxon Rank Sum test H0: the two pop are equal Ha: the two pop are not equal denoted by U FORMULA: U1=n1n2+ n1(n1+1)/2 - R1 U2=n1n2+ n2(n2+1)/2 - R2 R= sum of ranks of group