\documentclass[10pt,a4paper]{article} % Packages \usepackage{fancyhdr} % For header and footer \usepackage{multicol} % Allows multicols in tables \usepackage{tabularx} % Intelligent column widths \usepackage{tabulary} % Used in header and footer \usepackage{hhline} % Border under tables \usepackage{graphicx} % For images \usepackage{xcolor} % For hex colours %\usepackage[utf8x]{inputenc} % For unicode character support \usepackage[T1]{fontenc} % Without this we get weird character replacements \usepackage{colortbl} % For coloured tables \usepackage{setspace} % For line height \usepackage{lastpage} % Needed for total page number \usepackage{seqsplit} % Splits long words. %\usepackage{opensans} % Can't make this work so far. Shame. Would be lovely. \usepackage[normalem]{ulem} % For underlining links % Most of the following are not required for the majority % of cheat sheets but are needed for some symbol support. \usepackage{amsmath} % Symbols \usepackage{MnSymbol} % Symbols \usepackage{wasysym} % Symbols %\usepackage[english,german,french,spanish,italian]{babel} % Languages % Document Info \author{Dylan (dylablo)} \pdfinfo{ /Title (foundation-of-statistics-sec-1-and-2-under-shirin.pdf) /Creator (Cheatography) /Author (Dylan (dylablo)) /Subject (Foundation of Statistics Sec. 1 \& 2 under Shirin Cheat Sheet) } % Lengths and widths \addtolength{\textwidth}{6cm} \addtolength{\textheight}{-1cm} \addtolength{\hoffset}{-3cm} \addtolength{\voffset}{-2cm} \setlength{\tabcolsep}{0.2cm} % Space between columns \setlength{\headsep}{-12pt} % Reduce space between header and content \setlength{\headheight}{85pt} % If less, LaTeX automatically increases it \renewcommand{\footrulewidth}{0pt} % Remove footer line \renewcommand{\headrulewidth}{0pt} % Remove header line \renewcommand{\seqinsert}{\ifmmode\allowbreak\else\-\fi} % Hyphens in seqsplit % This two commands together give roughly % the right line height in the tables \renewcommand{\arraystretch}{1.3} \onehalfspacing % Commands \newcommand{\SetRowColor}[1]{\noalign{\gdef\RowColorName{#1}}\rowcolor{\RowColorName}} % Shortcut for row colour \newcommand{\mymulticolumn}[3]{\multicolumn{#1}{>{\columncolor{\RowColorName}}#2}{#3}} % For coloured multi-cols \newcolumntype{x}[1]{>{\raggedright}p{#1}} % New column types for ragged-right paragraph columns \newcommand{\tn}{\tabularnewline} % Required as custom column type in use % Font and Colours \definecolor{HeadBackground}{HTML}{333333} \definecolor{FootBackground}{HTML}{666666} \definecolor{TextColor}{HTML}{333333} \definecolor{DarkBackground}{HTML}{0BCBD9} \definecolor{LightBackground}{HTML}{EFFBFC} \renewcommand{\familydefault}{\sfdefault} \color{TextColor} % Header and Footer \pagestyle{fancy} \fancyhead{} % Set header to blank \fancyfoot{} % Set footer to blank \fancyhead[L]{ \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{C} \SetRowColor{DarkBackground} \vspace{-7pt} {\parbox{\dimexpr\textwidth-2\fboxsep\relax}{\noindent \hspace*{-6pt}\includegraphics[width=5.8cm]{/web/www.cheatography.com/public/images/cheatography_logo.pdf}} } \end{tabulary} \columnbreak \begin{tabulary}{11cm}{L} \vspace{-2pt}\large{\bf{\textcolor{DarkBackground}{\textrm{Foundation of Statistics Sec. 1 \& 2 under Shirin Cheat Sheet}}}} \\ \normalsize{by \textcolor{DarkBackground}{Dylan (dylablo)} via \textcolor{DarkBackground}{\uline{cheatography.com/68322/cs/17214/}}} \end{tabulary} \end{multicols}} \fancyfoot[L]{ \footnotesize \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{LL} \SetRowColor{FootBackground} \mymulticolumn{2}{p{5.377cm}}{\bf\textcolor{white}{Cheatographer}} \\ \vspace{-2pt}Dylan (dylablo) \\ \uline{cheatography.com/dylablo} \\ \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Cheat Sheet}} \\ \vspace{-2pt}Not Yet Published.\\ Updated 29th September, 2018.\\ Page {\thepage} of \pageref{LastPage}. \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Sponsor}} \\ \SetRowColor{white} \vspace{-5pt} %\includegraphics[width=48px,height=48px]{dave.jpeg} Measure your website readability!\\ www.readability-score.com \end{tabulary} \end{multicols}} \begin{document} \raggedright \raggedcolumns % Set font size to small. Switch to any value % from this page to resize cheat sheet text: % www.emerson.emory.edu/services/latex/latex_169.html \footnotesize % Small font. \begin{multicols*}{2} \begin{tabularx}{8.4cm}{X} \SetRowColor{DarkBackground} \mymulticolumn{1}{x{8.4cm}}{\bf\textcolor{white}{Basic Mathematical Symbols and Explanations}} \tn % Row 0 \SetRowColor{LightBackground} \mymulticolumn{1}{x{8.4cm}}{∑: Sum of the following set/set of values returned from a function. There's ususally a variable name and assignment underneath it, and a limit above - this means you are summing the values returned by the function with the value range from the bottom to to the top.\{\{nl\}\}`sum=0;\{\{nl\}\}for(int i =0; i++; i\textless{}=10)\{ \{\{nl\}\}sum += do\_function(i); \}`} \tn % Row Count 7 (+ 7) % Row 1 \SetRowColor{white} \mymulticolumn{1}{x{8.4cm}}{∏: Product of the following set/set of values returned from a function. There's ususally a variable name and assignment underneath it, and a limit above - this means you are getting the product of all the values returned by the function with the value range from the bottom to to the top.\{\{nl\}\}`prod=0;\{\{nl\}\}for(int i =0; i++; i\textless{}=10)\{ \{\{nl\}\}prod *= do\_function(i); \}`} \tn % Row Count 15 (+ 8) % Row 2 \SetRowColor{LightBackground} \mymulticolumn{1}{x{8.4cm}}{∀: For all/For every instance. Universal quantifier in predicate logic. ie. The stated holds true for every situation. Can be further expanded with ∀i (assuming i is defined in the previously stated function) followed by a subset or function which would read as "The stated holds true for all i in the following set/function"} \tn % Row Count 22 (+ 7) % Row 3 \SetRowColor{white} \mymulticolumn{1}{x{8.4cm}}{ℝ: Real numbers. ie. Not imaginary numbers (sqrt of a negative number) and not infinity. Integers, negatives, floats, doubles etc. are all considered "Real numbers"} \tn % Row Count 26 (+ 4) % Row 4 \SetRowColor{LightBackground} \mymulticolumn{1}{x{8.4cm}}{∫: Integral. Used for finding areas, volumes, central points etc. Not confident in my own summary, please follow this link \seqsplit{https://www.mathsisfun.com/calculus/integration-introduction.html}} \tn % Row Count 30 (+ 4) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{8.4cm}{X} \SetRowColor{DarkBackground} \mymulticolumn{1}{x{8.4cm}}{\bf\textcolor{white}{Basic Mathematical Symbols and Explanations (cont)}} \tn % Row 5 \SetRowColor{LightBackground} \mymulticolumn{1}{x{8.4cm}}{lim`a-\textgreater{}x`: This is the function to find/define the limit of possible values returned when a is fed into the following function ie. \{\{nl\}\}lim`a-\textgreater{}-∞`F(a)=0\{\{nl\}\}Which means that the lowest value for F(a) where a gets as close to -∞ as possible (approaching from 0) is limited to 0. ie. Lower limit is 0. Can be used to define upper limits with +∞ and limits for discrete variables by specifying their unique upper and lower bounds ie. \{\{nl\}\}lim`a-\textgreater{}6`f(a)=1 \{\{nl\}\}where 0\textless{}=a\textless{}=6} \tn % Row Count 10 (+ 10) \hhline{>{\arrayrulecolor{DarkBackground}}-} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{8.4cm}{x{4 cm} x{4 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Definitions, Properties, Rules and Laws}} \tn % Row 0 \SetRowColor{LightBackground} The Additivity Property & If A∩B = ∅ then P(A∪B) = P(A)+P(B)\{\{nl\}\}\{\{nl\}\}If A∩B != ∅ then P(A∪B) = P(A)+P(B)-P(A∩B)\{\{nl\}\}\{\{nl\}\}P(A\textasciicircum{}c\textasciicircum{})=1-P(A) \tn % Row Count 7 (+ 7) % Row 1 \SetRowColor{white} The Multiplication Rule & P(A∩B) = P(A|B)P(B) \tn % Row Count 9 (+ 2) % Row 2 \SetRowColor{LightBackground} The Law of Total Probability & Given disjoint events B1,B2,...,Bm such that \{\{nl\}\}∪\textasciicircum{}m\textasciicircum{}i=1 Bi = Ω\{\{nl\}\} (ie. The union of all events B1 through Bm is the same as the entire sample space)\{\{nl\}\}Then the probability of a random/arbitrary event A is expressed as...\{\{nl\}\}P(A) = ∑\textasciicircum{}m\textasciicircum{}i=1 P(A|Bi)P(Bi)\{\{nl\}\}(ie. The sum of probabilities of all events Bi where A occurs) \tn % Row Count 26 (+ 17) % Row 3 \SetRowColor{white} Bayes' Rule & Given disjoint events B1,B2,...,Bm and\{\{nl\}\}∪\textasciicircum{}m\textasciicircum{}i=1 Bi = Ω\{\{nl\}\} (ie. The union of all events B1 through Bm is the same as the entire sample space)\{\{nl\}\}Then the conditional probability of Bi, given that a random/arbitrary event A occurs is...\{\{nl\}\}P(Bi|A) = P(A|Bi)P(Bi)/∑\textasciicircum{}m\textasciicircum{}j=1P(A|Bj)P(Bj)\{\{nl\}\}(ie. !!!VERIFY!!!THe probability of Bi given that A occurs is the calculated by dividing the the probabiltiy P(A∩B) \textless{}according to the multiplication rule\textgreater{} by the sum of the probabilities of A intersecting all other events in the sample space \textless{}according to the multiplication rules)\textgreater{} \tn % Row Count 56 (+ 30) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{8.4cm}{x{4 cm} x{4 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Definitions, Properties, Rules and Laws (cont)}} \tn % Row 4 \SetRowColor{LightBackground} Properties of the Probability Mass Function aka pmf & All probabilities are positive: fx(x) ≥ 0. \{\{nl\}\}\{\{nl\}\}Any event in the distribution (e.g. "scoring between 20 and 30") has a probability of happening of between 0 and 1 (e.g. 0\% and 100\%). \{\{nl\}\}\{\{nl\}\}The sum of all probabilities is 100\% (i.e. 1 as a decimal): Σfx(x) = 1. \{\{nl\}\}\{\{nl\}\}An individual probability is found by adding up the x-values in event A. P(X Ε A)=Σ`x∈A`f(X) \tn % Row Count 20 (+ 20) % Row 5 \SetRowColor{white} Properties of the Cumulative Distribution Function aka cdf & 1. For a\textless{}=b then F(A)\textless{}=F(b) ie. if a\textless{}=b then the cdf of a\textless{}=the cdf of b.\{\{nl\}\}\{\{nl\}\}2. F(a) is a probability 0\textless{}=F(a)\textless{}=1, and\{\{nl\}\}lim`a-\textgreater{}+∞`F(a)=1\{\{nl\}\}lim`a-\textgreater{}-∞`F(a)=0\{\{nl\}\}ie. F(a) will never return a result bigger than 1 or smaller than 0.\{\{nl\}\}\{\{nl\}\}3. F is right-continuous\{\{nl\}\}lim`b-\textgreater{}0`F(a+b) = F(a).\{\{nl\}\}\{\{nl\}\}*a\textless{}=b implies that the event \{X\textless{}=a\} is contained(subset of) the event \{X\textless{}=b\} \tn % Row Count 40 (+ 20) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{8.4cm}{x{4 cm} x{4 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Definitions, Properties, Rules and Laws (cont)}} \tn % Row 6 \SetRowColor{LightBackground} Properties of Expectation aka E(X) & E(aX) = aE(X) ∀ a is a constant \{\{nl\}\}E(XY) = E(X)E(Y) when X and Y are independent\{\{nl\}\}E(a+bX) = a+bE(X) linearity\{\{nl\}\}E(X+Y) = E(X)+E(Y) linearity\{\{nl\}\}E{[}Σ`i=1`\textasciicircum{}n\textasciicircum{}Xi{]} = Σ`i=1`\textasciicircum{}n\textasciicircum{}E{[}Xi{]} \tn % Row Count 10 (+ 10) % Row 7 \SetRowColor{white} Properties of Variance aka V(X) & Vara(aX) = a\textasciicircum{}2\textasciicircum{}Var(X) ∀ a is a constant\{\{nl\}\}Var(a+X) = Var(X) ∀ a is a constant \tn % Row Count 15 (+ 5) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{8.4cm}{x{4 cm} x{4 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Probability/Statistics \& Set Notation}} \tn % Row 0 \SetRowColor{LightBackground} P(A) & Probability of event A occuring. (Number of ways event A can occur / Number of total outcomes possible)) \tn % Row Count 6 (+ 6) % Row 1 \SetRowColor{white} Ω & Sample Space/Universe. P(Ω)=1 \tn % Row Count 8 (+ 2) % Row 2 \SetRowColor{LightBackground} ∅ & Empty/Null set \tn % Row Count 9 (+ 1) % Row 3 \SetRowColor{white} P(A∩B) & Probability of A Intersection B \tn % Row Count 11 (+ 2) % Row 4 \SetRowColor{LightBackground} \seqsplit{Disjoint/Independent/Mutually} Exclusive & If A∩B = ∅ then \seqsplit{disjoint/independent} of each other \tn % Row Count 14 (+ 3) % Row 5 \SetRowColor{white} P(A∪B) & If \seqsplit{disjoint/independent} of one another P(A∪B) = P(A) + P(B)\{\{nl\}\}\{\{nl\}\}If not disjoint P(A∪B) = P(A) + P(B) - P(A∩B) \tn % Row Count 21 (+ 7) % Row 6 \SetRowColor{LightBackground} A\textasciicircum{}c\textasciicircum{} & A complement. Everything outside A. P(A\textasciicircum{}c\textasciicircum{}) = 1 - P(A) \tn % Row Count 24 (+ 3) % Row 7 \SetRowColor{white} A∈B / A∉B & A is an element of B / A is not an element of B \tn % Row Count 27 (+ 3) % Row 8 \SetRowColor{LightBackground} A: A ∈ B & A such that A is an element of B \tn % Row Count 29 (+ 2) % Row 9 \SetRowColor{white} n! aka Permutations & Counting method where ORDER matters. n! = \seqsplit{n(n-1)(n-2)...(n-k+1)} where k = sample size \tn % Row Count 34 (+ 5) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{8.4cm}{x{4 cm} x{4 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Probability/Statistics \& Set Notation (cont)}} \tn % Row 10 \SetRowColor{LightBackground} (\textasciicircum{}n\textasciicircum{}k) aka Combinations & Counting method where order does not matter. (\textasciicircum{}n\textasciicircum{}k) = n!/k!(n-k)!) \tn % Row Count 4 (+ 4) % Row 11 \SetRowColor{white} P(A|B) aka Conditional Probability & The Probability of A happening, given that B occurs.\{\{nl\}\}\{\{nl\}\}If A and B are \seqsplit{disjoint/independent/mutually} exclusive then P(A|B)=P(A) as B has no effect on A.\{\{nl\}\}\{\{nl\}\}If A and B are dependent ie. B has an effect on the chances of A the P(A|B) = P(A∩B)/P(B) \{\{nl\}\}\{\{nl\}\} P(A|B)+P(A\textasciicircum{}c\textasciicircum{}|B)=1 \tn % Row Count 19 (+ 15) % Row 12 \SetRowColor{LightBackground} P(Bi|A)P(A) = P(A|Bi)P(Bi) & Proven by the combination of Bayes' rule and Law of total probability applied to P(A) \tn % Row Count 24 (+ 5) % Row 13 \SetRowColor{white} Independence of more than 2 events & Events A1,A2,...,Am are independent if \{\{nl\}\} P(∩\textasciicircum{}m\textasciicircum{}i=1Ai) = ∏\textasciicircum{}m\textasciicircum{}i=1P(Ai)\{\{nl\}\}(ie. They are independent events if the probability of all of their intersections are equal to the product of all of their individual probabilities)\{\{nl\}\}\{\{nl\}\}A and B are independent. B and C are independent. This does not mean that A and C are independent, nor does it mean they must be dependent. \tn % Row Count 44 (+ 20) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{8.4cm}{x{4 cm} x{4 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Probability/Statistics \& Set Notation (cont)}} \tn % Row 14 \SetRowColor{LightBackground} Random variable aka. rv & Any variable whose value is not known prior to the experiment and are subject to chance aka. Variability aka. Change.\{\{nl\}\}Has an associated probability aka. mass\{\{nl\}\}\{\{nl\}\}An rv is a type of mapping function over the whole sample space and is associated with measure theory. ie. An rv can transform the sample space. \tn % Row Count 16 (+ 16) % Row 15 \SetRowColor{white} Discrete & There is a set number of outcomes \tn % Row Count 18 (+ 2) % Row 16 \SetRowColor{LightBackground} Discrete Random Variable & Any function X: Ω→ℝ that takes on some value. eg. X could be S=sum or M=max ran on a sample space, getting the sum/max of each experiment outcome and constructing a new sample space out of it. \tn % Row Count 28 (+ 10) % Row 17 \SetRowColor{white} Probability Mass Function aka pmf & The pmf of some discrete rv X. Essentially creating a table/graph displaying all the probabilities of all possible values our discrete rv can be. Please refer to "Properties of the Probability Mass Function aka PMF for more details."Explained here \seqsplit{http://www.statisticshowto.com/probability-mass-function-pmf/} \tn % Row Count 44 (+ 16) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{8.4cm}{x{4 cm} x{4 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Probability/Statistics \& Set Notation (cont)}} \tn % Row 18 \SetRowColor{LightBackground} Cumulative Distribution Function aka cdf & The cdf of some discrete rv can be used to determine the probability above, below and between values occuring. Please refer to "Properties of the Cumulative Distribution Function aka CDF for more details." Explained here \seqsplit{http://www.statisticshowto.com/cumulative-distribution-function/} \tn % Row Count 15 (+ 15) % Row 19 \SetRowColor{white} Continuous & An infinite number of possible values. \tn % Row Count 17 (+ 2) % Row 20 \SetRowColor{LightBackground} Continuous Random Variables & Is a function X: Ω→ℝ that takes on any value a∈ℝ\{\{nl\}\}Mass/Associated probability no longer considered for each possible value of X instead consider the likelihood that X∈(a,b) for a\textless{}b. \tn % Row Count 27 (+ 10) % Row 21 \SetRowColor{white} Probability Density Function aka pdf & Pdf on a continuous rv f(x) of X is an integrable function such that...\{\{nl\}\}P(a\textless{}=X\textless{}=b) = ∫\textasciicircum{}b\textasciicircum{}`a`f(x)dx\{\{nl\}\}ie. it is the area under the cure between points a and b. Therefore it is the probability of a range of values occuring s.t. conditions on f\{\{nl\}\}f(x)\textgreater{}=0 ∀x∈Ω\{\{nl\}\}∫\textasciicircum{}∞\textasciicircum{}`-∞`f(x)dx=1 ie. the complete area under the curve contains all outcomes.\{\{nl\}\}\{\{nl\}\}This is defined by the formula...\{\{nl\}\}F(x)=∫\textasciicircum{}x\textasciicircum{}`-∞`f(u)du = P(X\textless{}=x) \tn % Row Count 50 (+ 23) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{8.4cm}{x{4 cm} x{4 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Probability/Statistics \& Set Notation (cont)}} \tn % Row 22 \SetRowColor{LightBackground} Expectation aka. E(x) & The expected value of a random variable This is found using the formula when our rv is discrete\{\{nl\}\}E(X) = Σ`xi∈Ω` xi p(xi)\{\{nl\}\}\{\{nl\}\}and the following formula when the rv is continuous\{\{nl\}\}E(X)=∫`Ω` x f(x)dx\{\{nl\}\}\{\{nl\}\}To make this easier to understand The expected value is simply the mean and is calculated as the sum of (each possible value muiltiplied by it's independent probability) ie The sum of weighted values to probabilities \tn % Row Count 23 (+ 23) % Row 23 \SetRowColor{white} Variance aka Var(X) & A method of measuring how far the actual value of a rv may be from the expected value. Given a discrete variable X the formula is..\{\{nl\}\}Var(X)=Σ`xi∈Ω` x\textasciicircum{}2\textasciicircum{}i p(xi) - (Σ`xi∈Ω` xi p(xi))\textasciicircum{}2\textasciicircum{}\{\{nl\}\}\{\{nl\}\}Or given a continuous rv use the formula\{\{nl\}\}Var(X)=∫`Ω` x\textasciicircum{}2\textasciicircum{} f(x)dx - (∫`Ω` x f(x)dx)\textasciicircum{}2\textasciicircum{}\{\{nl\}\}\{\{nl\}\}In other words we sum up (the squared value's multiplied by their individual probabilities) and finally deduct the the squared expected value. \tn % Row Count 47 (+ 24) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{8.4cm}{x{4 cm} x{4 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Probability/Statistics \& Set Notation (cont)}} \tn % Row 24 \SetRowColor{LightBackground} Standard Deviation & Another method similar to variance about looking at how far distribution goes from the mean ie. The actual value vs the expected value. Simply calculated with the sqrt(Var(X)). Benefit of this is that it is expressed in the same unit that X is expressed in rather than the squared as variance is. \tn % Row Count 15 (+ 15) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} % That's all folks \end{multicols*} \end{document}