\documentclass[10pt,a4paper]{article} % Packages \usepackage{fancyhdr} % For header and footer \usepackage{multicol} % Allows multicols in tables \usepackage{tabularx} % Intelligent column widths \usepackage{tabulary} % Used in header and footer \usepackage{hhline} % Border under tables \usepackage{graphicx} % For images \usepackage{xcolor} % For hex colours %\usepackage[utf8x]{inputenc} % For unicode character support \usepackage[T1]{fontenc} % Without this we get weird character replacements \usepackage{colortbl} % For coloured tables \usepackage{setspace} % For line height \usepackage{lastpage} % Needed for total page number \usepackage{seqsplit} % Splits long words. %\usepackage{opensans} % Can't make this work so far. Shame. Would be lovely. \usepackage[normalem]{ulem} % For underlining links % Most of the following are not required for the majority % of cheat sheets but are needed for some symbol support. \usepackage{amsmath} % Symbols \usepackage{MnSymbol} % Symbols \usepackage{wasysym} % Symbols %\usepackage[english,german,french,spanish,italian]{babel} % Languages % Document Info \author{Nathaliemayor} \pdfinfo{ /Title (statistics-i.pdf) /Creator (Cheatography) /Author (Nathaliemayor) /Subject (Statistics I Cheat Sheet) } % Lengths and widths \addtolength{\textwidth}{6cm} \addtolength{\textheight}{-1cm} \addtolength{\hoffset}{-3cm} \addtolength{\voffset}{-2cm} \setlength{\tabcolsep}{0.2cm} % Space between columns \setlength{\headsep}{-12pt} % Reduce space between header and content \setlength{\headheight}{85pt} % If less, LaTeX automatically increases it \renewcommand{\footrulewidth}{0pt} % Remove footer line \renewcommand{\headrulewidth}{0pt} % Remove header line \renewcommand{\seqinsert}{\ifmmode\allowbreak\else\-\fi} % Hyphens in seqsplit % This two commands together give roughly % the right line height in the tables \renewcommand{\arraystretch}{1.3} \onehalfspacing % Commands \newcommand{\SetRowColor}[1]{\noalign{\gdef\RowColorName{#1}}\rowcolor{\RowColorName}} % Shortcut for row colour \newcommand{\mymulticolumn}[3]{\multicolumn{#1}{>{\columncolor{\RowColorName}}#2}{#3}} % For coloured multi-cols \newcolumntype{x}[1]{>{\raggedright}p{#1}} % New column types for ragged-right paragraph columns \newcommand{\tn}{\tabularnewline} % Required as custom column type in use % Font and Colours \definecolor{HeadBackground}{HTML}{333333} \definecolor{FootBackground}{HTML}{666666} \definecolor{TextColor}{HTML}{333333} \definecolor{DarkBackground}{HTML}{C8B6D1} \definecolor{LightBackground}{HTML}{F8F5F9} \renewcommand{\familydefault}{\sfdefault} \color{TextColor} % Header and Footer \pagestyle{fancy} \fancyhead{} % Set header to blank \fancyfoot{} % Set footer to blank \fancyhead[L]{ \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{C} \SetRowColor{DarkBackground} \vspace{-7pt} {\parbox{\dimexpr\textwidth-2\fboxsep\relax}{\noindent \hspace*{-6pt}\includegraphics[width=5.8cm]{/web/www.cheatography.com/public/images/cheatography_logo.pdf}} } \end{tabulary} \columnbreak \begin{tabulary}{11cm}{L} \vspace{-2pt}\large{\bf{\textcolor{DarkBackground}{\textrm{Statistics I Cheat Sheet}}}} \\ \normalsize{by \textcolor{DarkBackground}{Nathaliemayor} via \textcolor{DarkBackground}{\uline{cheatography.com/69859/cs/17697/}}} \end{tabulary} \end{multicols}} \fancyfoot[L]{ \footnotesize \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{LL} \SetRowColor{FootBackground} \mymulticolumn{2}{p{5.377cm}}{\bf\textcolor{white}{Cheatographer}} \\ \vspace{-2pt}Nathaliemayor \\ \uline{cheatography.com/nathaliemayor} \\ \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Cheat Sheet}} \\ \vspace{-2pt}Not Yet Published.\\ Updated 11th November, 2018.\\ Page {\thepage} of \pageref{LastPage}. \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Sponsor}} \\ \SetRowColor{white} \vspace{-5pt} %\includegraphics[width=48px,height=48px]{dave.jpeg} Measure your website readability!\\ www.readability-score.com \end{tabulary} \end{multicols}} \begin{document} \raggedright \raggedcolumns % Set font size to small. Switch to any value % from this page to resize cheat sheet text: % www.emerson.emory.edu/services/latex/latex_169.html \footnotesize % Small font. \begin{multicols*}{2} \begin{tabularx}{8.4cm}{x{2.56 cm} x{5.44 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{exploratory data analysis}} \tn % Row 0 \SetRowColor{LightBackground} types of variables: & {\bf{Categorical}} (nominal no order ex color of eyes or ordinal order ex.lvl of education variables)/\{\{nl\}\} {\bf{Numerical}} variables: discrete and continous variables \tn % Row Count 7 (+ 7) % Row 1 \SetRowColor{white} numerical summaries & {\bf{quantile}}: value that proportion p the data is smaller than Q(p) and 1-p bigger\{\{nl\}\} first quantile Q1: p=0.25, {\bf{median}} Q2: p=0.5 and third quantile: p=0.75 Q3, \{\{nl\}\} {\bf{IQR}} is the interquartile range = Q3-Q1 contains 50\% of the data \{\{nl\}\} Formula for the rank is p(n-1)+1 if not integer extrapolate with 2 values between with weight \tn % Row Count 20 (+ 13) % Row 2 \SetRowColor{LightBackground} measures of center & {\bf{MODE}}: most frequent value \{\{nl\}\} {\bf{MEDIAN}}: Q(0.50)/\{\{nl\}\} {\bf{MEAN}}: average, tot/n \{\{nl\}\} if {\bf{unimodal}} and {\bf{symetric}} distribution mean=median, right skewed mode\textless{}median\textless{}mean \tn % Row Count 27 (+ 7) % Row 3 \SetRowColor{white} \mymulticolumn{2}{x{8.4cm}}{variance and sd} \tn % Row Count 28 (+ 1) % Row 4 \SetRowColor{LightBackground} Graphics & {\bf{pies}}, \{\{nl\}\} {\bf{barplots}} (frequency or rf, any order, specific categories ex faculties), \{\{nl\}\} {\bf{contingency tables}} (2 or + categorical variables), \{\{nl\}\} {\bf{mosaic plot}} (translation of CT, if aligned, independant), \{\{nl\}\} {\bf{frequency table}} (numerical variable, f, rf=proportion, cumulative f, cummulative rf, densities rf/amplitude, order), \{\{nl\}\} {\bf{hitograms}} (translatio of FT, area proportional to class frequency = density, numerical variables, order needed, size can be an interval no precise value as bp), \{\{nl\}\} {\bf{BOXPLOT}} (IQR and 1.5*IQR, put median, LB, UB), \{\{nl\}\} {\bf{QQ-plot}} (compare two distribution theorical and empitrical, if 45° same distribution) \tn % Row Count 54 (+ 26) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{8.4cm}{x{3.68 cm} x{4.32 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Statistical inference}} \tn % Row 0 \SetRowColor{LightBackground} simpson paradox & heterogenous sources: divide to more homogenous subgroups: ex by major because could bias the proportion : {\bf{controlling for the confounding factor}} men chose the easiest program whereas women chose the more difficult to enter: \{\{nl\}\} the solution is to use a weighted average of the admission rates \tn % Row Count 15 (+ 15) % Row 1 \SetRowColor{white} sampling the population & {\bf{population}}: what we want to analyse, want to find the population's parameters, these are true and ifxed values but usually unknown \{\{nl\}\} {\bf{sample}}: what we have, piece of the population chosen randomly, parameters are random variables, should be as large as possible to limit bias, sample have incomplete information, if finite population without replacement of sample can affect results \tn % Row Count 34 (+ 19) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{8.4cm}{x{3.68 cm} x{4.32 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Statistical inference (cont)}} \tn % Row 2 \SetRowColor{LightBackground} point estimation & {\bf{estimators}} an estimator is a parametor calculated with the simple. it tries to estimate the true parameter of the population it is a random variables and parameter are fixed but unknown within a certain certitude: {\bf{confidence intervals}} \tn % Row Count 12 (+ 12) % Row 3 \SetRowColor{white} Estimator & to estimate a parameter and its uncertainty: ex: μ, the more sampling, the more precise because variance decreases with N large concentrated distribution around true value \tn % Row Count 21 (+ 9) % Row 4 \SetRowColor{LightBackground} central limit thm & when we sum random variables from the same distribution: sum/n= new variable that follow a normal distribution when n is large special case for proportion (binomial) \tn % Row Count 29 (+ 8) % Row 5 \SetRowColor{white} estimating variance s\textasciicircum{}2 and s ̃2, & if x follow a normal distrbution, follos khi 2 distribution with n-1 degrees of freedom similar to variance estimation \tn % Row Count 35 (+ 6) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{8.4cm}{x{3.68 cm} x{4.32 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{8.4cm}}{\bf\textcolor{white}{Statistical inference (cont)}} \tn % Row 6 \SetRowColor{LightBackground} confidence intervals & from central limit thm: C is a certain value for with prob of (1-a) that the estimator is in the interval, small alfa, bigger interval, not exactly 95/100 but around value, prob, if normal distribution use student distribution so modify CI to be more precise, \tn % Row Count 13 (+ 13) % Row 7 \SetRowColor{white} for proportions: & \textasciicircum{}p estimate mean \tn % Row Count 14 (+ 1) % Row 8 \SetRowColor{LightBackground} \mymulticolumn{2}{x{8.4cm}}{for median} \tn % Row Count 15 (+ 1) % Row 9 \SetRowColor{white} \mymulticolumn{2}{x{8.4cm}}{for variance} \tn % Row Count 16 (+ 1) % Row 10 \SetRowColor{LightBackground} for the difference of means & when 0 is not in the interval: significante différence \tn % Row Count 19 (+ 3) % Row 11 \SetRowColor{white} theory of estimation & depends on situation, can evaluate the quality of estimator, good one has nu bias, \tn % Row Count 23 (+ 4) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} % That's all folks \end{multicols*} \end{document}