\documentclass[10pt,a4paper]{article}

% Packages
\usepackage{fancyhdr}           % For header and footer
\usepackage{multicol}           % Allows multicols in tables
\usepackage{tabularx}           % Intelligent column widths
\usepackage{tabulary}           % Used in header and footer
\usepackage{hhline}             % Border under tables
\usepackage{graphicx}           % For images
\usepackage{xcolor}             % For hex colours
%\usepackage[utf8x]{inputenc}    % For unicode character support
\usepackage[T1]{fontenc}        % Without this we get weird character replacements
\usepackage{colortbl}           % For coloured tables
\usepackage{setspace}           % For line height
\usepackage{lastpage}           % Needed for total page number
\usepackage{seqsplit}           % Splits long words.
%\usepackage{opensans}          % Can't make this work so far. Shame. Would be lovely.
\usepackage[normalem]{ulem}     % For underlining links
% Most of the following are not required for the majority
% of cheat sheets but are needed for some symbol support.
\usepackage{amsmath}            % Symbols
\usepackage{MnSymbol}           % Symbols
\usepackage{wasysym}            % Symbols
%\usepackage[english,german,french,spanish,italian]{babel}              % Languages

% Document Info
\author{ethanbaka}
\pdfinfo{
  /Title (gea1000-final.pdf)
  /Creator (Cheatography)
  /Author (ethanbaka)
  /Subject (GEA1000 FINAL Cheat Sheet)
}

% Lengths and widths
\addtolength{\textwidth}{6cm}
\addtolength{\textheight}{-1cm}
\addtolength{\hoffset}{-3cm}
\addtolength{\voffset}{-2cm}
\setlength{\tabcolsep}{0.2cm} % Space between columns
\setlength{\headsep}{-12pt} % Reduce space between header and content
\setlength{\headheight}{85pt} % If less, LaTeX automatically increases it
\renewcommand{\footrulewidth}{0pt} % Remove footer line
\renewcommand{\headrulewidth}{0pt} % Remove header line
\renewcommand{\seqinsert}{\ifmmode\allowbreak\else\-\fi} % Hyphens in seqsplit
% This two commands together give roughly
% the right line height in the tables
\renewcommand{\arraystretch}{1.3}
\onehalfspacing

% Commands
\newcommand{\SetRowColor}[1]{\noalign{\gdef\RowColorName{#1}}\rowcolor{\RowColorName}} % Shortcut for row colour
\newcommand{\mymulticolumn}[3]{\multicolumn{#1}{>{\columncolor{\RowColorName}}#2}{#3}} % For coloured multi-cols
\newcolumntype{x}[1]{>{\raggedright}p{#1}} % New column types for ragged-right paragraph columns
\newcommand{\tn}{\tabularnewline} % Required as custom column type in use

% Font and Colours
\definecolor{HeadBackground}{HTML}{333333}
\definecolor{FootBackground}{HTML}{666666}
\definecolor{TextColor}{HTML}{333333}
\definecolor{DarkBackground}{HTML}{A3A3A3}
\definecolor{LightBackground}{HTML}{F3F3F3}
\renewcommand{\familydefault}{\sfdefault}
\color{TextColor}

% Header and Footer
\pagestyle{fancy}
\fancyhead{} % Set header to blank
\fancyfoot{} % Set footer to blank
\fancyhead[L]{
\noindent
\begin{multicols}{3}
\begin{tabulary}{5.8cm}{C}
    \SetRowColor{DarkBackground}
    \vspace{-7pt}
    {\parbox{\dimexpr\textwidth-2\fboxsep\relax}{\noindent
        \hspace*{-6pt}\includegraphics[width=5.8cm]{/web/www.cheatography.com/public/images/cheatography_logo.pdf}}
    }
\end{tabulary}
\columnbreak
\begin{tabulary}{11cm}{L}
    \vspace{-2pt}\large{\bf{\textcolor{DarkBackground}{\textrm{GEA1000 FINAL Cheat Sheet}}}} \\
    \normalsize{by \textcolor{DarkBackground}{ethanbaka} via \textcolor{DarkBackground}{\uline{cheatography.com/216432/cs/47277/}}}
\end{tabulary}
\end{multicols}}

\fancyfoot[L]{ \footnotesize
\noindent
\begin{multicols}{3}
\begin{tabulary}{5.8cm}{LL}
  \SetRowColor{FootBackground}
  \mymulticolumn{2}{p{5.377cm}}{\bf\textcolor{white}{Cheatographer}}  \\
  \vspace{-2pt}ethanbaka \\
  \uline{cheatography.com/ethanbaka} \\
  \end{tabulary}
\vfill
\columnbreak
\begin{tabulary}{5.8cm}{L}
  \SetRowColor{FootBackground}
  \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Cheat Sheet}}  \\
   \vspace{-2pt}Not Yet Published.\\
   Updated 2nd November, 2025.\\
   Page {\thepage} of \pageref{LastPage}.
\end{tabulary}
\vfill
\columnbreak
\begin{tabulary}{5.8cm}{L}
  \SetRowColor{FootBackground}
  \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Sponsor}}  \\
  \SetRowColor{white}
  \vspace{-5pt}
  %\includegraphics[width=48px,height=48px]{dave.jpeg}
  Measure your website readability!\\
  www.readability-score.com
\end{tabulary}
\end{multicols}}


\begin{document}
\raggedright
\raggedcolumns

% Set font size to small. Switch to any value
% from this page to resize cheat sheet text:
% www.emerson.emory.edu/services/latex/latex_169.html
\footnotesize % Small font.

\begin{multicols*}{3}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Probability Sampling Methods}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{- Sampling Process via a {\bf{known randomised mechanism.}} The probability of selection {\bf{may not be the same}} throughout all units of the sampling frame. Element of {\bf{chance}} in selection process {\bf{eliminates biases}} associated with selection. \newline % Row Count 5 (+ 5)
-{\bf{Simple Random Sampling}}: A sample of size n is chosen from the sampling frame such that every unit has an equal chance to be selected, through RNG. Advantage: Good Representation, Disadvantage: Non-response, time consuming, accessibility of info \newline % Row Count 10 (+ 5)
-{\bf{Systematic Sampling}}:The xth unit is chosen from every n/k units •where x,k are chosen integers and n is the size of the sampling frame. k selection interval. Advantage: Simple Disadvantage: Not good representation \newline % Row Count 15 (+ 5)
-{\bf{Stratified Random Sampling}}: The population is divided into groups (strata) and SRS is applied to each strata to form the sample. Ex: Sample count during GE. Advantage: Good representation Disadvantage: Need info about sampling frame and strata.  \newline % Row Count 21 (+ 6)
-{\bf{Cluster Sampling}}:The population is divided into similar clusters and a fixed number of clusters are chosen using SRS. Advantage: less tedious, time-consuming, costly. Disadv: High variability if clusters are dissimilar, req larger sample size to achieve low margin of error.% Row Count 27 (+ 6)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Non-Probability Sampling}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{- {\bf{Convenience sampling}}: Subjects are chosen based on proximity and availability (Mall surveys)  \newline % Row Count 2 (+ 2)
- {\bf{Volunteer sampling}}: Subjects volunteer themselves into a sample (Online Polls)% Row Count 4 (+ 2)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Criteria for generalisability}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{0.Sampling frame ≥ population (Include people that used to be in population, duplicate, etc) \newline % Row Count 2 (+ 2)
 1.Probability sampling method implemented (selection bias ↓)  \newline % Row Count 4 (+ 2)
 2.Large sample size (variability and random error ↓) \newline % Row Count 6 (+ 2)
 3.Minimise non-response% Row Count 7 (+ 1)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Types of Variables}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{{\bf{Categorical}}: Variables that take on mutually exclusive categories  (eg colours of cars) \newline % Row Count 2 (+ 2)
{\bf{Numerical}}: Variables with numerical values where arithmetic can be performed meaningful (mass)% Row Count 4 (+ 2)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Variable Sub-types}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{{\bf{Ordinal}}: Categorical variables where there is some natural ordering (eg feeling on a scale of 1-5) \newline % Row Count 3 (+ 3)
{\bf{Nominal}}: Categorical variable where there is no intrinsic ordering  (eg pet ownership in SG) \newline % Row Count 5 (+ 2)
{\bf{Discrete}}: Numerical variable with gaps in the set of possible numbers  (eg no of members in fam, 3.75 doesnt exist) \newline % Row Count 8 (+ 3)
{\bf{Continuous}}: Numerical variable that can be all values in a given range Random: Numerical variable with probabilities assigned to each value (eg range of time from 0-5s, all possible values have meaningful intepretation)% Row Count 13 (+ 5)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Study Design}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{{\bf{Experimental study}}: The independent variable is intentionally manipulated to observe its effect on the dependent variable (change x to see change in y) \newline % Row Count 4 (+ 4)
{\bf{Observational study}}: Individuals are observed and variables are measured without any manipulation% Row Count 7 (+ 3)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{p{0.4977 cm} p{0.4977 cm} }
\SetRowColor{DarkBackground}
\mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{Blinding}}  \tn
% Row 0
\SetRowColor{LightBackground}
\mymulticolumn{2}{x{5.377cm}}{} \tn 
% Row Count 0 (+ 0)
\hhline{>{\arrayrulecolor{DarkBackground}}--}
\SetRowColor{LightBackground}
\mymulticolumn{2}{x{5.377cm}}{{\bf{Single blinding}} is achieved when subjects do not know what group yhey belong to   \newline {\bf{Double blinding}} is achieved when neither the subjects nor the assessors •are aware of the assignment}  \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}--}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Research Targets}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{Population: Entire group we wish to know something about  \newline % Row Count 2 (+ 2)
Sample: A proportion of the population selected in the study \newline % Row Count 4 (+ 2)
 Sampling frame: "Source Material" from which sample is drawn \newline % Row Count 6 (+ 2)
 Census: An attempt to reach out to the entire population of interest% Row Count 8 (+ 2)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Basic Rule of Rates}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{rate(A | B) ≤ rate(A) ≤ rate(A | NB) or vice versa. This means: The closer rate(B) is to 100\%, the closer rate(A) is to rate(A | B) If rate(B) = 50\%, then rate(A) = 0.5{[}rate(A |B) + rate(A | NB){]} If rate(A | B) = rate(A | NB), rate(A) = rate(A | B) = rate(A | NB)% Row Count 6 (+ 6)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Probability, Sensitivity and Specificity}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{Probability in Independent Event \newline % Row Count 1 (+ 1)
For independent events A and B: P(A) = P(A | B) P(A) × P(B) = P(A ∩ B)   \newline % Row Count 3 (+ 2)
Sensitivity and Specificity    \newline % Row Count 4 (+ 1)
Sensitivity = P(Test Positive | Individual is infected) Specificity = P(Test Negative | Individual is not infected)% Row Count 7 (+ 3)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Correlation Coefficient}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{Measure of the {\bf{linear association}} between two variables  \newline % Row Count 2 (+ 2)
 -1 ≤ r ≤ 1 •0 to ± 0.3 = weak, ± 0.3 to ± 0.7 = moderate, ± 0.7 to ± 1 = strong   \newline % Row Count 4 (+ 2)
Removing outliers can increase, decrease, or cause no change to r \newline % Row Count 6 (+ 2)
r is not affected by interchanging the x and y variables  r=Cov(X,Y)/SDx*SDy. Cov(X,Y)=Cov(Y,X) \newline % Row Count 8 (+ 2)
r is not affected by adding a number to all values of a variable. (eg y=2x, if +10 to allx,curve move right) \newline % Row Count 11 (+ 3)
r is not affected by multiplying a number to all values of a variable% Row Count 13 (+ 2)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Outliers}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{An outlier is an observation that falls well above or below the overall bulk of the data . A general rule is that outliers should not be removed unnecessarily x is an outlier if x \textgreater{} Q3 + 1.5·IQR or x \textless{} Q1 - 1.5·IQR.  \newline % Row Count 5 (+ 5)
Left skewed curve -{}-\textgreater{} Peak on the right. Mean \textless{} Median \textless{} Mode% Row Count 7 (+ 2)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Confounders}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{A third variable that is associated with both the independent and dependent variables. When a confounder is present, segregate the data by the confounding variable. This method is called slicing% Row Count 4 (+ 4)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Simpson's Paradox}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{A phenomenon in which a trend appears in more than half of the groups of data but changes when the groups are combined% Row Count 3 (+ 3)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Symmetric Association}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{Rate(A | B) \textgreater{} rate(A | NB) ⟺ rate(B | A) \textgreater{} rate(B | NA)  \newline % Row Count 2 (+ 2)
Rate(A | B) \textless{} rate(A | NB) ⟺ rate(B | A) \textless{} rate(B | NA)  \newline % Row Count 4 (+ 2)
Rate(A | B) = rate(A | NB) ⟺ rate(B | A) = rate(B | NA)% Row Count 6 (+ 2)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Establishing Association}}  \tn
% Row 0
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{5.377cm}}{Positive Assoc. between A and B (Negative flip sign)} \tn 
% Row Count 2 (+ 2)
% Row 1
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{Rate (A|B) \textgreater{} Rate (A|NB)} \tn 
% Row Count 3 (+ 1)
% Row 2
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{5.377cm}}{Rate (B|A) \textgreater{} Rate (B|NA)} \tn 
% Row Count 4 (+ 1)
% Row 3
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{Rate (NA|NB) \textgreater{} Rate (NA|B)} \tn 
% Row Count 5 (+ 1)
% Row 4
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{5.377cm}}{Rate (NB|NA) \textgreater{} Rate (NB|A)} \tn 
% Row Count 6 (+ 1)
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Confidence Intervals}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{Confidence interval is a range of values likely to contain a population parameter based on a certain degree of confidence  We are 95\% confident that the population parameter lies within the confidence interval  Another interpretation is that 95\% of the researchers who repeat the experiment will have intervals that contain the population parameter It is a common mistake to say that there is 95\% chance that the population parameter lies within the confidence intervalProperties of Confidence Intervals The larger the sample size, the smaller the random error and •narrower the confidence interval The higher the confidence level, the wider the confidence interval   \newline % Row Count 14 (+ 14)
pop'n mean = xbar +- t* x (s/root n) \newline % Row Count 15 (+ 1)
pop'n proportion= p{\emph{ +- z}} x root(p{\emph{(1-p}})/n)% Row Count 16 (+ 1)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Normal Distribution}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{- {\bf{The null hypothesis}} asserts the stand of no effect, meaning that the variances in the sample are not inherent in the population and occured by random chance when choosing sample  \newline % Row Count 4 (+ 4)
-{\bf{The alternative hypothesis}} is what we wish to confirm and pit against the null hypothesis  Through hypothesis testing, we wish to reject the null hypothesis in favour of the alternative hypothesis  \newline % Row Count 9 (+ 5)
-If p-value ≥ SL, do not reject null hypothesis% Row Count 10 (+ 1)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{t test and chi square}}  \tn
\SetRowColor{LightBackground}
\mymulticolumn{1}{p{5.377cm}}{\vspace{1px}\centerline{\includegraphics[width=5.1cm]{/web/www.cheatography.com/public/uploads/ethanbaka_1762060906_Screenshot 2025-11-02 131655.png}}} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{5.377cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Ecological and Atomistic Data}}  \tn
\SetRowColor{white}
\mymulticolumn{1}{x{5.377cm}}{-{\bf{Ecological Fallacy}} deduces the inferences on correlation about individuals based on aggregated data (country with high average income, assumes indiv is wealthy) \newline % Row Count 4 (+ 4)
-{\bf{Atomistic Fallacy}} generalise the correlation based on indiv towards the aggregate level correlation \newline % Row Count 7 (+ 3)
(eg one person with high education makes more money, means higher education in country will lead to higher national income)% Row Count 10 (+ 3)
} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}


% That's all folks
\end{multicols*}

\end{document}