Cheatography https://cheatography.com

Download This Cheat Sheet (PDF)

Comments
Rating: ()

StatiscalThinkingPython Cheat Sheet (DRAFT) by elhamsh

This is a draft cheat sheet. It is a work in progress and is not finished yet.

EDA

import seaborn as sns	seaborn is used to set the plotting
sns.set()	Set default Seaborn style
The "square root rule" is a commonly-used rule of thumb for choosing number of bins: choose the number of bins to be the square root of the number of samples.
Bee swarm plot	Draw a categorical scatterplot with non-overlapping points.
sns.swarmplot(x='colname1', y='colname2', data=df)	colname1 is categorical. y is for the numbers.
ECDF	Empirical cumulative distribution function. It is one of the important plots for understanding the data.
plt.plot(x, y, marker='.', linestyle='none')
plt.margins(0.02)	Keeps data off plot edges
np.arange(3,7)	array([3, 4, 5, 6])
numpy.arange([start, ]stop, [step, ]dtype=None)	Return evenly spaced values within a given interval.

numpy

np.percentile(arrayname,[2.5, 25])	Compute the 2.5 and 25 percentiles of variable arrayname
sns.boxplot(x=colname1, y=colname2, data=df)
np.var(arrayname)	compute the variance of numpy array arrayname
np.std(arrayname)	compute the standard deviation of numpy array arrayname
np.cov(x, y)	returns a 2D array where entries [0,1] and [1,0] are the covariances. Entry [0,0] is the variance of the data in x, and entry [1,1] is the variance of the data in y. This 2D output array is called the covariance matrix, since it organizes the self- and covariance.
np.corrcoef()	Pearson correlation coefficient, also called the Pearson r, is often easier to interpret than the covariance. It is computed using the np.corrcoef() function. Like np.cov(), it takes two arrays as arguments and returns a 2D array. Entries [0,0] and [1,1] are necessarily equal to 1 (can you think about why?), and the value we are after is entry [0,1].

hypotheses

permutation sampling	permutation sampling is a great way to simulate the hypothesis that two variables have identical probability distributions
np.random.permutation(data)	Permute the concatenated array
np.concatenate((data1, data2))	Concatenate the data sets
The p-value is generally a measure of:	the probability of observing a test statistic equally or more extreme than the one you observed, assuming the hypothesis you are testing is true.
a permutation replicate	is a single value of a statistic computed from a permutation sample.

probabilistic logic

Statistical inference involves taking your data to probabilistic conclusions about what you would expect if you took even more data, and you can make decisions based on these conclusions.
np.random.random()	The function returns a random number between zero and one
np.random.seed(42)	Seed the random number generator
np.empty(100000)	Initialize an empty array, random_numbers, of 100,000 entries
np.random.binomial(n=100, p=0.05, size=10000)	# Take 10,000 samples out of the binomial distribution: n_defaults
np.random.poisson(10, size=10000)	Draw 10,000 samples out of Poisson distribution with a mean of 10
np.random.normal(20, 1, size=100000)	Draw 100,000 samples from a Normal distribution that has a mean of 20 and a standard deviation of 1
plt.hist(array, bins=100, normed=True, histtype='step')	histtype='step' smoothes histogram
plt.ylim(a, b)	limit the y axes between a and b
np.random.exponential(mean, size=size)
slope, intercept = np.polyfit(x, y, degree)	found the slope and intercept of the points (x,y). degree determines the degree of polynomial
np.linspace(a, b, c)	get c points in the range between a and b
np.empty_like(variable)	This function returns a new array with the same shape and type as a given array "variable"
Bootstrapping	The use of resampled data to perform statistical inference
If we have a data set with nn repeated measurements, a bootstrap sample is an array of length nn that was drawn from the original data with replacemen
np.random.choice(array, size=n)	Generate bootstrap sample from array with size n
Confidence interval of a statistic	If we repeated measurements over and over again, p% of the observed values would lie within the p% confidence interval.
A confidence interval gives bounds on the range of parameter values you might expect to get if we repeated our measurements. For named distributions, you can compute them analytically or look them up, but one of the many beautiful properties of the bootstrap method is that you can just take percentiles of your bootstrap replicates to get your confidence interval. Conveniently, you can use the np.percentile() function.
pairs bootstrap	involves resampling pairs of data.

Download the StatiscalThinkingPython Cheat Sheet

2 Pages

PDF (recommended)

PDF (2 pages)

Alternative Downloads

Latest Cheat Sheet

1 Page

(0)

Nimble Commander Cheat Sheet

Keyboard shortcuts to use the free dual-pane file manager for macOS: Nimble Commander (https://magnumbytes.com/)!

8 Aug 25

files, utilities

Random Cheat Sheet

5 Pages

(0)

Transporte celular Cheat Sheet

19 Nov 19

biologia, transporte, celular

español (Spanish)

About Cheatography

Cheatography is a collection of 6775 cheat sheets and quick references in 25 languages for everything from linux to programming!

Behind the Scenes

If you have any problems, or just want to say hi, you can find us right here:

Recent Cheat Sheet Activity

Chaplang updated Nimble Commander.
4 days 10 hours ago

linux_china updated dotenvx.
6 days 21 hours ago

Gael.langlais updated Ansible Playbook.
1 week ago

musmankkh updated Python Beginner to Advanced.
1 week 2 days ago

AnaPLopes updated Síndromes Glomerulares - Nefrologia.
2 weeks 3 days ago

© 2011 - 2025 Cheatography.com | CC License | Terms | Privacy

Latest Cheat Sheets RSS Feed