\documentclass[10pt,a4paper]{article} % Packages \usepackage{fancyhdr} % For header and footer \usepackage{multicol} % Allows multicols in tables \usepackage{tabularx} % Intelligent column widths \usepackage{tabulary} % Used in header and footer \usepackage{hhline} % Border under tables \usepackage{graphicx} % For images \usepackage{xcolor} % For hex colours %\usepackage[utf8x]{inputenc} % For unicode character support \usepackage[T1]{fontenc} % Without this we get weird character replacements \usepackage{colortbl} % For coloured tables \usepackage{setspace} % For line height \usepackage{lastpage} % Needed for total page number \usepackage{seqsplit} % Splits long words. %\usepackage{opensans} % Can't make this work so far. Shame. Would be lovely. \usepackage[normalem]{ulem} % For underlining links % Most of the following are not required for the majority % of cheat sheets but are needed for some symbol support. \usepackage{amsmath} % Symbols \usepackage{MnSymbol} % Symbols \usepackage{wasysym} % Symbols %\usepackage[english,german,french,spanish,italian]{babel} % Languages % Document Info \author{elhamsh} \pdfinfo{ /Title (statiscalthinkingpython.pdf) /Creator (Cheatography) /Author (elhamsh) /Subject (StatiscalThinkingPython Cheat Sheet) } % Lengths and widths \addtolength{\textwidth}{6cm} \addtolength{\textheight}{-1cm} \addtolength{\hoffset}{-3cm} \addtolength{\voffset}{-2cm} \setlength{\tabcolsep}{0.2cm} % Space between columns \setlength{\headsep}{-12pt} % Reduce space between header and content \setlength{\headheight}{85pt} % If less, LaTeX automatically increases it \renewcommand{\footrulewidth}{0pt} % Remove footer line \renewcommand{\headrulewidth}{0pt} % Remove header line \renewcommand{\seqinsert}{\ifmmode\allowbreak\else\-\fi} % Hyphens in seqsplit % This two commands together give roughly % the right line height in the tables \renewcommand{\arraystretch}{1.3} \onehalfspacing % Commands \newcommand{\SetRowColor}[1]{\noalign{\gdef\RowColorName{#1}}\rowcolor{\RowColorName}} % Shortcut for row colour \newcommand{\mymulticolumn}[3]{\multicolumn{#1}{>{\columncolor{\RowColorName}}#2}{#3}} % For coloured multi-cols \newcolumntype{x}[1]{>{\raggedright}p{#1}} % New column types for ragged-right paragraph columns \newcommand{\tn}{\tabularnewline} % Required as custom column type in use % Font and Colours \definecolor{HeadBackground}{HTML}{333333} \definecolor{FootBackground}{HTML}{666666} \definecolor{TextColor}{HTML}{333333} \definecolor{DarkBackground}{HTML}{A3A3A3} \definecolor{LightBackground}{HTML}{F3F3F3} \renewcommand{\familydefault}{\sfdefault} \color{TextColor} % Header and Footer \pagestyle{fancy} \fancyhead{} % Set header to blank \fancyfoot{} % Set footer to blank \fancyhead[L]{ \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{C} \SetRowColor{DarkBackground} \vspace{-7pt} {\parbox{\dimexpr\textwidth-2\fboxsep\relax}{\noindent \hspace*{-6pt}\includegraphics[width=5.8cm]{/web/www.cheatography.com/public/images/cheatography_logo.pdf}} } \end{tabulary} \columnbreak \begin{tabulary}{11cm}{L} \vspace{-2pt}\large{\bf{\textcolor{DarkBackground}{\textrm{StatiscalThinkingPython Cheat Sheet}}}} \\ \normalsize{by \textcolor{DarkBackground}{elhamsh} via \textcolor{DarkBackground}{\uline{cheatography.com/31327/cs/14239/}}} \end{tabulary} \end{multicols}} \fancyfoot[L]{ \footnotesize \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{LL} \SetRowColor{FootBackground} \mymulticolumn{2}{p{5.377cm}}{\bf\textcolor{white}{Cheatographer}} \\ \vspace{-2pt}elhamsh \\ \uline{cheatography.com/elhamsh} \\ \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Cheat Sheet}} \\ \vspace{-2pt}Not Yet Published.\\ Updated 12th January, 2018.\\ Page {\thepage} of \pageref{LastPage}. \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Sponsor}} \\ \SetRowColor{white} \vspace{-5pt} %\includegraphics[width=48px,height=48px]{dave.jpeg} Measure your website readability!\\ www.readability-score.com \end{tabulary} \end{multicols}} \begin{document} \raggedright \raggedcolumns % Set font size to small. Switch to any value % from this page to resize cheat sheet text: % www.emerson.emory.edu/services/latex/latex_169.html \footnotesize % Small font. \begin{multicols*}{3} \begin{tabularx}{5.377cm}{x{2.4885 cm} x{2.4885 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{EDA}} \tn % Row 0 \SetRowColor{LightBackground} import seaborn as sns & seaborn is used to set the plotting \tn % Row Count 2 (+ 2) % Row 1 \SetRowColor{white} sns.set() & Set default Seaborn style \tn % Row Count 4 (+ 2) % Row 2 \SetRowColor{LightBackground} \mymulticolumn{2}{x{5.377cm}}{The "square root rule" is a commonly-used rule of thumb for choosing number of bins: choose the number of bins to be the square root of the number of samples.} \tn % Row Count 8 (+ 4) % Row 3 \SetRowColor{white} Bee swarm plot & Draw a categorical scatterplot with non-overlapping points. \tn % Row Count 11 (+ 3) % Row 4 \SetRowColor{LightBackground} \seqsplit{sns.swarmplot(x='colname1'}, y='colname2', data=df) & colname1 is categorical. y is for the numbers. \tn % Row Count 14 (+ 3) % Row 5 \SetRowColor{white} ECDF & Empirical cumulative distribution function. It is one of the important plots for understanding the data. \tn % Row Count 20 (+ 6) % Row 6 \SetRowColor{LightBackground} \mymulticolumn{2}{x{5.377cm}}{plt.plot(x, y, marker='.', linestyle='none')} \tn % Row Count 21 (+ 1) % Row 7 \SetRowColor{white} plt.margins(0.02) & Keeps data off plot edges \tn % Row Count 23 (+ 2) % Row 8 \SetRowColor{LightBackground} np.arange(3,7) & array({[}3, 4, 5, 6{]}) \tn % Row Count 24 (+ 1) % Row 9 \SetRowColor{white} numpy.arange({[}start, {]}stop, {[}step, {]}dtype=None) & Return evenly spaced values within a given interval. \tn % Row Count 27 (+ 3) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{x{2.28942 cm} x{2.68758 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{numpy}} \tn % Row 0 \SetRowColor{LightBackground} \seqsplit{np.percentile(arrayname},{[}2.5, 25{]}) & Compute the 2.5 and 25 percentiles of variable arrayname \tn % Row Count 3 (+ 3) % Row 1 \SetRowColor{white} \mymulticolumn{2}{x{5.377cm}}{sns.boxplot(x=colname1, y=colname2, data=df)} \tn % Row Count 4 (+ 1) % Row 2 \SetRowColor{LightBackground} np.var(arrayname) & compute the variance of numpy array arrayname \tn % Row Count 7 (+ 3) % Row 3 \SetRowColor{white} np.std(arrayname) & compute the standard deviation of numpy array arrayname \tn % Row Count 10 (+ 3) % Row 4 \SetRowColor{LightBackground} np.cov(x, y) & returns a 2D array where entries {[}0,1{]} and {[}1,0{]} are the covariances. Entry {[}0,0{]} is the variance of the data in x, and entry {[}1,1{]} is the variance of the data in y. This 2D output array is called the covariance matrix, since it organizes the self- and covariance. \tn % Row Count 23 (+ 13) % Row 5 \SetRowColor{white} np.corrcoef() & Pearson correlation coefficient, also called the Pearson r, is often easier to interpret than the covariance. It is computed using the np.corrcoef() function. Like np.cov(), it takes two arrays as arguments and returns a 2D array. Entries {[}0,0{]} and {[}1,1{]} are necessarily equal to 1 (can you think about why?), and the value we are after is entry {[}0,1{]}. \tn % Row Count 40 (+ 17) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{x{2.43873 cm} x{2.53827 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{hypotheses}} \tn % Row 0 \SetRowColor{LightBackground} permutation sampling & permutation sampling is a great way to simulate the hypothesis that two variables have identical probability distributions \tn % Row Count 7 (+ 7) % Row 1 \SetRowColor{white} \seqsplit{np.random.permutation(data)} & Permute the concatenated array \tn % Row Count 9 (+ 2) % Row 2 \SetRowColor{LightBackground} \seqsplit{np.concatenate((data1}, data2)) & Concatenate the data sets \tn % Row Count 11 (+ 2) % Row 3 \SetRowColor{white} The p-value is generally a measure of: & the probability of observing a test statistic equally or more extreme than the one you observed, assuming the hypothesis you are testing is true. \tn % Row Count 19 (+ 8) % Row 4 \SetRowColor{LightBackground} a permutation replicate & is a single value of a statistic computed from a permutation sample. \tn % Row Count 23 (+ 4) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{x{2.4885 cm} x{2.4885 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{probabilistic logic}} \tn % Row 0 \SetRowColor{LightBackground} \mymulticolumn{2}{x{5.377cm}}{Statistical inference involves taking your data to probabilistic conclusions about what you would expect if you took even more data, and you can make decisions based on these conclusions.} \tn % Row Count 4 (+ 4) % Row 1 \SetRowColor{white} np.random.random() & The function returns a random number between zero and one \tn % Row Count 7 (+ 3) % Row 2 \SetRowColor{LightBackground} np.random.seed(42) & Seed the random number generator \tn % Row Count 9 (+ 2) % Row 3 \SetRowColor{white} np.empty(100000) & Initialize an empty array, random\_numbers, of 100,000 entries \tn % Row Count 13 (+ 4) % Row 4 \SetRowColor{LightBackground} \seqsplit{np.random.binomial(n=100}, p=0.05, size=10000) & \# Take 10,000 samples out of the binomial distribution: n\_defaults \tn % Row Count 17 (+ 4) % Row 5 \SetRowColor{white} \seqsplit{np.random.poisson(10}, size=10000) & Draw 10,000 samples out of Poisson distribution with a mean of 10 \tn % Row Count 21 (+ 4) % Row 6 \SetRowColor{LightBackground} np.random.normal(20, 1, size=100000) & Draw 100,000 samples from a Normal distribution that has a mean of 20 and a standard deviation of 1 \tn % Row Count 26 (+ 5) % Row 7 \SetRowColor{white} plt.hist(array, bins=100, normed=True, histtype='step') & histtype='step' smoothes histogram \tn % Row Count 29 (+ 3) % Row 8 \SetRowColor{LightBackground} plt.ylim(a, b) & limit the y axes between a and b \tn % Row Count 31 (+ 2) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{5.377cm}{x{2.4885 cm} x{2.4885 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{probabilistic logic (cont)}} \tn % Row 9 \SetRowColor{LightBackground} \mymulticolumn{2}{x{5.377cm}}{\seqsplit{np.random.exponential(mean}, size=size)} \tn % Row Count 1 (+ 1) % Row 10 \SetRowColor{white} slope, intercept = np.polyfit(x, y, degree) & found the slope and intercept of the points (x,y). degree determines the degree of polynomial \tn % Row Count 6 (+ 5) % Row 11 \SetRowColor{LightBackground} np.linspace(a, b, c) & get c points in the range between a and b \tn % Row Count 9 (+ 3) % Row 12 \SetRowColor{white} \seqsplit{np.empty\_like(variable)} & This function returns a new array with the same shape and type as a given array "variable" \tn % Row Count 14 (+ 5) % Row 13 \SetRowColor{LightBackground} Bootstrapping & The use of resampled data to perform statistical inference \tn % Row Count 17 (+ 3) % Row 14 \SetRowColor{white} \mymulticolumn{2}{x{5.377cm}}{If we have a data set with nn repeated measurements, a bootstrap sample is an array of length nn that was drawn from the original data with replacemen} \tn % Row Count 20 (+ 3) % Row 15 \SetRowColor{LightBackground} \seqsplit{np.random.choice(array}, size=n) & Generate bootstrap sample from array with size n \tn % Row Count 23 (+ 3) % Row 16 \SetRowColor{white} Confidence interval of a statistic & If we repeated measurements over and over again, p\% of the observed values would lie within the p\% confidence interval. \tn % Row Count 29 (+ 6) % Row 17 \SetRowColor{LightBackground} \mymulticolumn{2}{x{5.377cm}}{A confidence interval gives bounds on the range of parameter values you might expect to get if we repeated our measurements. For named distributions, you can compute them analytically or look them up, but one of the many beautiful properties of the bootstrap method is that you can just take percentiles of your bootstrap replicates to get your confidence interval. Conveniently, you can use the np.percentile() function.} \tn % Row Count 38 (+ 9) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{5.377cm}{x{2.4885 cm} x{2.4885 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{probabilistic logic (cont)}} \tn % Row 18 \SetRowColor{LightBackground} pairs bootstrap & involves resampling pairs of data. \tn % Row Count 2 (+ 2) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} % That's all folks \end{multicols*} \end{document}