\documentclass[10pt,a4paper]{article} % Packages \usepackage{fancyhdr} % For header and footer \usepackage{multicol} % Allows multicols in tables \usepackage{tabularx} % Intelligent column widths \usepackage{tabulary} % Used in header and footer \usepackage{hhline} % Border under tables \usepackage{graphicx} % For images \usepackage{xcolor} % For hex colours %\usepackage[utf8x]{inputenc} % For unicode character support \usepackage[T1]{fontenc} % Without this we get weird character replacements \usepackage{colortbl} % For coloured tables \usepackage{setspace} % For line height \usepackage{lastpage} % Needed for total page number \usepackage{seqsplit} % Splits long words. %\usepackage{opensans} % Can't make this work so far. Shame. Would be lovely. \usepackage[normalem]{ulem} % For underlining links % Most of the following are not required for the majority % of cheat sheets but are needed for some symbol support. \usepackage{amsmath} % Symbols \usepackage{MnSymbol} % Symbols \usepackage{wasysym} % Symbols %\usepackage[english,german,french,spanish,italian]{babel} % Languages % Document Info \author{Pranay Kanwar (r4um)} \pdfinfo{ /Title (translate-stats-ml.pdf) /Creator (Cheatography) /Author (Pranay Kanwar (r4um)) /Subject (Translate\_Stats\_ML Cheat Sheet) } % Lengths and widths \addtolength{\textwidth}{6cm} \addtolength{\textheight}{-1cm} \addtolength{\hoffset}{-3cm} \addtolength{\voffset}{-2cm} \setlength{\tabcolsep}{0.2cm} % Space between columns \setlength{\headsep}{-12pt} % Reduce space between header and content \setlength{\headheight}{85pt} % If less, LaTeX automatically increases it \renewcommand{\footrulewidth}{0pt} % Remove footer line \renewcommand{\headrulewidth}{0pt} % Remove header line \renewcommand{\seqinsert}{\ifmmode\allowbreak\else\-\fi} % Hyphens in seqsplit % This two commands together give roughly % the right line height in the tables \renewcommand{\arraystretch}{1.3} \onehalfspacing % Commands \newcommand{\SetRowColor}[1]{\noalign{\gdef\RowColorName{#1}}\rowcolor{\RowColorName}} % Shortcut for row colour \newcommand{\mymulticolumn}[3]{\multicolumn{#1}{>{\columncolor{\RowColorName}}#2}{#3}} % For coloured multi-cols \newcolumntype{x}[1]{>{\raggedright}p{#1}} % New column types for ragged-right paragraph columns \newcommand{\tn}{\tabularnewline} % Required as custom column type in use % Font and Colours \definecolor{HeadBackground}{HTML}{333333} \definecolor{FootBackground}{HTML}{666666} \definecolor{TextColor}{HTML}{333333} \definecolor{DarkBackground}{HTML}{A3A3A3} \definecolor{LightBackground}{HTML}{F3F3F3} \renewcommand{\familydefault}{\sfdefault} \color{TextColor} % Header and Footer \pagestyle{fancy} \fancyhead{} % Set header to blank \fancyfoot{} % Set footer to blank \fancyhead[L]{ \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{C} \SetRowColor{DarkBackground} \vspace{-7pt} {\parbox{\dimexpr\textwidth-2\fboxsep\relax}{\noindent \hspace*{-6pt}\includegraphics[width=5.8cm]{/web/www.cheatography.com/public/images/cheatography_logo.pdf}} } \end{tabulary} \columnbreak \begin{tabulary}{11cm}{L} \vspace{-2pt}\large{\bf{\textcolor{DarkBackground}{\textrm{Translate\_Stats\_ML Cheat Sheet}}}} \\ \normalsize{by \textcolor{DarkBackground}{Pranay Kanwar (r4um)} via \textcolor{DarkBackground}{\uline{cheatography.com/57680/cs/18133/}}} \end{tabulary} \end{multicols}} \fancyfoot[L]{ \footnotesize \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{LL} \SetRowColor{FootBackground} \mymulticolumn{2}{p{5.377cm}}{\bf\textcolor{white}{Cheatographer}} \\ \vspace{-2pt}Pranay Kanwar (r4um) \\ \uline{cheatography.com/r4um} \\ \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Cheat Sheet}} \\ \vspace{-2pt}Published 4th December, 2018.\\ Updated 24th August, 2020.\\ Page {\thepage} of \pageref{LastPage}. \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Sponsor}} \\ \SetRowColor{white} \vspace{-5pt} %\includegraphics[width=48px,height=48px]{dave.jpeg} Measure your website readability!\\ www.readability-score.com \end{tabulary} \end{multicols}} \begin{document} \raggedright \raggedcolumns % Set font size to small. Switch to any value % from this page to resize cheat sheet text: % www.emerson.emory.edu/services/latex/latex_169.html \footnotesize % Small font. \begin{tabularx}{17.67cm}{x{5.7358 cm} x{5.5671 cm} x{5.5671 cm} } \SetRowColor{DarkBackground} \mymulticolumn{3}{x{17.67cm}}{\bf\textcolor{white}{Terminology}} \tn % Row 0 \SetRowColor{LightBackground} {\bf{Statistics}}\{\{bb\}\} & {\bf{Machine learning}}\{\{bb\}\} & {\bf{Notes}}\{\{bb\}\} \tn % Row Count 2 (+ 2) % Row 1 \SetRowColor{white} data point, record, row of data & example, instance & Both domains also use "observation," which can refer to a single measurement or an entire vector of attributes depending on context. \tn % Row Count 13 (+ 11) % Row 2 \SetRowColor{LightBackground} response variable, dependent variable & label, output & Both domains also use "target." Since practically {\emph{all}} variables depend on other variables, the term "dependent variable" is potentially misleading. \tn % Row Count 25 (+ 12) % Row 3 \SetRowColor{white} variable, covariate, predictor, independent variable & feature, \{\{link="https://arxiv.org/pdf/1511.06429.pdf"\}\}side information\{\{/link\}\}, input & The term "independent variable" exists for historical reasons but is usually misleading-{}-such a variable typically depends on other variables in the model. \tn % Row Count 37 (+ 12) \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{17.67cm}{x{5.7358 cm} x{5.5671 cm} x{5.5671 cm} } \SetRowColor{DarkBackground} \mymulticolumn{3}{x{17.67cm}}{\bf\textcolor{white}{Terminology (cont)}} \tn % Row 4 \SetRowColor{LightBackground} regressions & supervised learners, machines & Both estimate output(s) in terms of input(s). \tn % Row Count 4 (+ 4) % Row 5 \SetRowColor{white} estimation & learning & Both translate data into quantitative claims, becoming more accurate as the supply of relevant data increases. \tn % Row Count 13 (+ 9) % Row 6 \SetRowColor{LightBackground} hypothesis ≠ classifier & hypothesis & In both statistics and ML, a hypothesis is a scientific statement to be scrutinized, such as "The true value of this parameter is zero." \{\{nl\}\}\{\{nl\}\} In ML (but not in statistics), a hypothesis can also \{\{link="https://www.quora.com/What-does-the-hypothesis-space-mean-in-Machine-Learning"\}\}refer to the prediction rule\{\{/link\}\} that is output by a classifier algorithm. \tn % Row Count 42 (+ 29) \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{17.67cm}{x{5.7358 cm} x{5.5671 cm} x{5.5671 cm} } \SetRowColor{DarkBackground} \mymulticolumn{3}{x{17.67cm}}{\bf\textcolor{white}{Terminology (cont)}} \tn % Row 7 \SetRowColor{LightBackground} bias ≠ regression intercept & bias & Statistics \seqsplit{distinguishes} between\{\{nl\}\}(a) bias as form of estimation error and\{\{nl\}\}(b) the default prediction of a linear model in the special case where all inputs are 0.\{\{nl\}\} \{\{link="https://www.quora.com/Why-do-we-need-the-bias-term-in-ML-algorithms-such-as-linear-regression-and-neural-networks"\}\}ML sometimes uses "bias" to refer to both of these concepts\{\{/link\}\}, although the best ML researchers certainly understand the difference. \tn % Row Count 34 (+ 34) \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{17.67cm}{x{5.7358 cm} x{5.5671 cm} x{5.5671 cm} } \SetRowColor{DarkBackground} \mymulticolumn{3}{x{17.67cm}}{\bf\textcolor{white}{Terminology (cont)}} \tn % Row 8 \SetRowColor{LightBackground} Maximize the likelihood to estimate model parameters & If your target distribution is discrete (such as in logistic regression), minimize the entropy to derive the best parameters. \{\{nl\}\}\{\{nl\}\} If your target distribution is continuous, fine, just maximize the likelihood. & For discrete \seqsplit{distributions}, maximizing the likelihood is equivalent to minimizing the entropy. \tn % Row Count 17 (+ 17) % Row 9 \SetRowColor{white} \{\{link="https://simple.wikipedia.org/wiki/Occam\%27s\_razor"\}\}Apply Occam's razor\{\{/link\}\}, or encode missing prior information with suitably \seqsplit{uninformative} priors & Apply \{\{link="https://en.wikipedia.org/wiki/Principle\_of\_maximum\_entropy"\}\}the principle of maximum entropy\{\{/link\}\}. & The principle of maximum entropy is conceptual and does not refer to maximizing a concrete objective function. The principle is that models should be conservative in the sense that they be no more confident in the predictions than is thoroughly justified by the data. In practice this works out as deriving an estimation procedure in terms of a bare-minimum set of criteria as exemplified \{\{link="http://cseweb.ucsd.edu/\textasciitilde{}elkan/254spring02/gidofalvi.pdf"\}\}here\{\{/link\}\} or \{\{link="http://www.win-vector.com/dfiles/LogisticRegressionMaxEnt.pdf"\}\}here\{\{/link\}\}. \tn % Row Count 60 (+ 43) \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{17.67cm}{x{5.7358 cm} x{5.5671 cm} x{5.5671 cm} } \SetRowColor{DarkBackground} \mymulticolumn{3}{x{17.67cm}}{\bf\textcolor{white}{Terminology (cont)}} \tn % Row 10 \SetRowColor{LightBackground} \seqsplit{logistic/multinomial} regression & maximum entropy, MaxEnt & \{\{link="https://www.quora.com/What-is-the-relationship-between-Log-Linear-model-MaxEnt-model-and-Logistic-Regression"\}\}They are equivalent\{\{/link\}\} except in special multinomial settings like ordinal logistic regression. Note that maximum entropy here refers to the principle of maximum entropy, not the form of the objective function. Indeed, in MaxEnt, you minimize rather than maximize the entropy expression. \tn % Row Count 32 (+ 32) \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{17.67cm}{x{5.7358 cm} x{5.5671 cm} x{5.5671 cm} } \SetRowColor{DarkBackground} \mymulticolumn{3}{x{17.67cm}}{\bf\textcolor{white}{Terminology (cont)}} \tn % Row 11 \SetRowColor{LightBackground} X causes Y if surgical (or randomized controlled) \seqsplit{manipulations} in X are correlated with changes in Y & X causes Y if it doesn't obviously not cause Y. For example, \{\{link="http://www.cs.cmu.edu/\textasciitilde{}bziebart/publications/maximum-causal-entropy.pdf"\}\}X causes Y if X precedes Y in time (or is at least contemporaneous)\{\{/link\}\} & The stats definition is more aligned with common-sense intuition than the ML one proposed here. In fairness, not all ML \seqsplit{practitioners} are so abusive of causation terminology, and some of the blame belongs with even earlier abuses such as \{\{link="https://en.wikipedia.org/wiki/Granger\_causality\#Limitations"\}\}Granger causality\{\{/link\}\}. \tn % Row Count 26 (+ 26) % Row 12 \SetRowColor{white} structural equations model & Bayesian network & These are nearly equivalent \seqsplit{mathematically}, although \seqsplit{interpretations} differ by use case, \{\{link="https://stats.stackexchange.com/questions/103183/structural-equation-models-sems-versus-bayesian-networks-bns"\}\}as discussed\{\{/link\}\}. \tn % Row Count 44 (+ 18) \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{17.67cm}{x{5.7358 cm} x{5.5671 cm} x{5.5671 cm} } \SetRowColor{DarkBackground} \mymulticolumn{3}{x{17.67cm}}{\bf\textcolor{white}{Terminology (cont)}} \tn % Row 13 \SetRowColor{LightBackground} sequential experimental design & active learning, \seqsplit{reinforcement} learning, \seqsplit{hyperparameter} optimization & Although these four subfields are very different from each other in terms of their standard use cases, they all address problems of optimization via a sequence of \seqsplit{queries/experiments}. \tn % Row Count 15 (+ 15) \hhline{>{\arrayrulecolor{DarkBackground}}---} \SetRowColor{LightBackground} \mymulticolumn{3}{x{17.67cm}}{Source \seqsplit{https://insights.sei.cmu.edu/sei\_blog/2018/11/translating-between-statistics-and-machine-learning.html}} \tn \hhline{>{\arrayrulecolor{DarkBackground}}---} \end{tabularx} \par\addvspace{1.3em} \end{document}