Show Menu

Hypothesis testing with Scipy Cheat Sheet by

1 Sample T-Testing

For nume­rical data.

Compares a sample mean to a hypoth­etical population mean.

from scipy.s­tats import ttest_­1samp

ttest­_1samp requires two inputs, a distri­bution of values and an expected mean.

tstat, pval = ttest_­1sa­mp(­exa­mpl­e_d­ist­rib­ution, expect­ed_­mean)

2 Sample T-Test

For nume­rical data.

Compares two sets of data, which are both approx­imately normally distri­buted.

The null hypoth­esis, in this case, is that the two distri­butions have the same mean.

from scipy.s­tats import ttest_­ind

It takes the two distri­butions as inputs and returns the t-stat­istic and a p-value.

t, pval = ttest_­ind­(da­taset1, datase­t2)


For nume­rical data.

Compares more than two numerical datasets.
ANOVA (Analysis of Variance) tests the null hypothesis that all of the datasets have the same mean.

from scipy.s­tats import f_oneway

It takes in each dataset as a different input and returns the t-stat­istic and the p-value.

t, pval = f_onew­ay(a, b, c)

Tukey's Range Test

For nume­rical data.

We can perform a Tukey's Range Test to determine the difference between datasets.

from statsm­ode­ls.s­ta­ts.m­ul­ticomp import pairwi­se_­tuk­eyhsd

We have to provide the function with one list of all of the data and a list of labels that tell the function which elements of the list are from which set. We also provide the signi­ficance level we want, which is usually 0.05.

values = np.con­cat­ena­te([a, b, c])
labels = ['a'] * len(a) + ['b'] * len(b) + ['c'] * len(c)
tuke­y_r­esults = pairwi­se_­tuk­eyh­sd(­values, labels, 0.05)

Binomial Test

For cate­gor­ical data.

To analyze a dataset with two different possib­ilities for entries.

The null hypoth­esis, in this case, would be that there is no difference between the observed behavior and the expected behavior.

from scipy.s­tats import binom_­test

binom­_test requires three inputs, the number of observed successes, the number of total trials, and an expected probab­ility of success.

pval = binom_­tes­t(525, n=1000, p=0.5)

Chi Square Test

For cate­gor­ical data.

To compare two or more catego­rical datasets.

from scipy.s­tats import chi2_c­ont­ing­ency

The input to chi2_c­ont­ingency is a cont­ingency table where:

- The columns represent different outcomes, like "­Survey Response A" vs. "­Survey Response B" or "­Clicked a Link" vs. "­Didn't Click"
- The rows are each a different condition, such as men vs. women or Interface A vs. Interface B

X = [[30, 10],
[35, 5],
[28, 12],
[20, 20]]
_, pval, _, _ = chi2_c­ont­ing­enc­y(X)

Help Us Go Positive!

We offset our carbon usage with Ecologi. Click the link below to help us!

We offset our carbon footprint via Ecologi


No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

            Python 3 Cheat Sheet by Finxter