Show Menu

Hypothesis testing with Scipy Cheat Sheet by

1 Sample T-Testing

For nume­rical data.

Compares a sample mean to a hypoth­etical population mean.

from scipy.s­tats import ttest_­1samp

ttest­_1samp requires two inputs, a distri­bution of values and an expected mean.

tstat, pval = ttest_­1sa­mp(­exa­mpl­e_d­ist­rib­ution, expect­ed_­mean)

2 Sample T-Test

For nume­rical data.

Compares two sets of data, which are both approx­imately normally distri­buted.

The null hypoth­esis, in this case, is that the two distri­butions have the same mean.

from scipy.s­tats import ttest_­ind

It takes the two distri­butions as inputs and returns the t-stat­istic and a p-value.

t, pval = ttest_­ind­(da­taset1, datase­t2)


For nume­rical data.

Compares more than two numerical datasets.
ANOVA (Analysis of Variance) tests the null hypothesis that all of the datasets have the same mean.

from scipy.s­tats import f_oneway

It takes in each dataset as a different input and returns the t-stat­istic and the p-value.

t, pval = f_onew­ay(a, b, c)

Tukey's Range Test

For nume­rical data.

We can perform a Tukey's Range Test to determine the difference between datasets.

from statsm­ode­ls.s­ta­ts.m­ul­ticomp import pairwi­se_­tuk­eyhsd

We have to provide the function with one list of all of the data and a list of labels that tell the function which elements of the list are from which set. We also provide the signi­ficance level we want, which is usually 0.05.

values = np.con­cat­ena­te([a, b, c])
labels = ['a'] * len(a) + ['b'] * len(b) + ['c'] * len(c)
tuke­y_r­esults = pairwi­se_­tuk­eyh­sd(­values, labels, 0.05)

Binomial Test

For cate­gor­ical data.

To analyze a dataset with two different possib­ilities for entries.

The null hypoth­esis, in this case, would be that there is no difference between the observed behavior and the expected behavior.

from scipy.s­tats import binom_­test

binom­_test requires three inputs, the number of observed successes, the number of total trials, and an expected probab­ility of success.

pval = binom_­tes­t(525, n=1000, p=0.5)

Chi Square Test

For cate­gor­ical data.

To compare two or more catego­rical datasets.

from scipy.s­tats import chi2_c­ont­ing­ency

The input to chi2_c­ont­ingency is a cont­ingency table where:

- The columns represent different outcomes, like "­Survey Response A" vs. "­Survey Response B" or "­Clicked a Link" vs. "­Didn't Click"
- The rows are each a different condition, such as men vs. women or Interface A vs. Interface B

X = [[30, 10],
[35, 5],
[28, 12],
[20, 20]]
_, pval, _, _ = chi2_c­ont­ing­enc­y(X)


No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

          Essential Shortcuts in Python Cheat Sheet
            Python 3 Cheat Sheet by Finxter