Show Menu

Hypothesis testing with Scipy Cheat Sheet by

1 Sample T-Testing

For numerical data.

Compares a sample mean to a hypoth­etical population mean.

from scipy.s­tats import ttest_­1samp

ttest_­1samp requires two inputs, a distri­bution of values and an expected mean.

tstat, pval = ttest_­1sa­mp(­exa­mpl­e_d­ist­rib­ution, expect­ed_­mean)

2 Sample T-Test

For numerical data.

Compares two sets of data, which are both approx­imately normally distri­buted.

The null hypothesis, in this case, is that the two distri­butions have the same mean.

from scipy.s­tats import ttest_ind

It takes the two distri­butions as inputs and returns the t-stat­istic and a p-value.

t, pval = ttest_­ind­(da­taset1, dataset2)


For numerical data.

Compares more than two numerical datasets.
ANOVA (Analysis of Variance) tests the null hypothesis that all of the datasets have the same mean.

from scipy.s­tats import f_oneway

It takes in each dataset as a different input and returns the t-stat­istic and the p-value.

t, pval = f_onew­ay(a, b, c)

Tukey's Range Test

For numerical data.

We can perform a Tukey's Range Test to determine the difference between datasets.

from statsm­ode­ls.s­ta­ts.m­ul­ticomp import pairwi­se_­tuk­eyhsd

We have to provide the function with one list of all of the data and a list of labels that tell the function which elements of the list are from which set. We also provide the signif­icance level we want, which is usually 0.05.

values = np.con­cat­ena­te([a, b, c])
labels = ['a'] * len(a) + ['b'] * len(b) + ['c'] * len(c)
tukey_­results = pairwi­se_­tuk­eyh­sd(­values, labels, 0.05)

Binomial Test

For catego­rical data.

To analyze a dataset with two different possib­ilities for entries.

The null hypothesis, in this case, would be that there is no difference between the observed behavior and the expected behavior.

from scipy.s­tats import binom_test

binom_test requires three inputs, the number of observed successes, the number of total trials, and an expected probab­ility of success.

pval = binom_­tes­t(525, n=1000, p=0.5)

Chi Square Test

For catego­rical data.

To compare two or more catego­rical datasets.

from scipy.s­tats import chi2_c­ont­ingency

The input to chi2_c­ont­ingency is a contin­gency table where:

- The columns represent different outcomes, like "­Survey Response A" vs. "­Survey Response B" or "­Clicked a Link" vs. "­Didn't Click"
- The rows are each a different condition, such as men vs. women or Interface A vs. Interface B

X = [[30, 10],
[35, 5],
[28, 12],
[20, 20]]
_, pval, _, _ = chi2_c­ont­ing­ency(X)


No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

            Python 3 Cheat Sheet by Finxter
          Essential Shortcuts in Python Cheat Sheet