\documentclass[10pt,a4paper]{article}

% Packages
\usepackage{fancyhdr}           % For header and footer
\usepackage{multicol}           % Allows multicols in tables
\usepackage{tabularx}           % Intelligent column widths
\usepackage{tabulary}           % Used in header and footer
\usepackage{hhline}             % Border under tables
\usepackage{graphicx}           % For images
\usepackage{xcolor}             % For hex colours
%\usepackage[utf8x]{inputenc}    % For unicode character support
\usepackage[T1]{fontenc}        % Without this we get weird character replacements
\usepackage{colortbl}           % For coloured tables
\usepackage{setspace}           % For line height
\usepackage{lastpage}           % Needed for total page number
\usepackage{seqsplit}           % Splits long words.
%\usepackage{opensans}          % Can't make this work so far. Shame. Would be lovely.
\usepackage[normalem]{ulem}     % For underlining links
% Most of the following are not required for the majority
% of cheat sheets but are needed for some symbol support.
\usepackage{amsmath}            % Symbols
\usepackage{MnSymbol}           % Symbols
\usepackage{wasysym}            % Symbols
%\usepackage[english,german,french,spanish,italian]{babel}              % Languages

% Document Info
\author{Ivan Patel (patelivan)}
\pdfinfo{
  /Title (introduction-to-regression-in-r.pdf)
  /Creator (Cheatography)
  /Author (Ivan Patel (patelivan))
  /Subject (Introduction to Regression in R Cheat Sheet)
}

% Lengths and widths
\addtolength{\textwidth}{6cm}
\addtolength{\textheight}{-1cm}
\addtolength{\hoffset}{-3cm}
\addtolength{\voffset}{-2cm}
\setlength{\tabcolsep}{0.2cm} % Space between columns
\setlength{\headsep}{-12pt} % Reduce space between header and content
\setlength{\headheight}{85pt} % If less, LaTeX automatically increases it
\renewcommand{\footrulewidth}{0pt} % Remove footer line
\renewcommand{\headrulewidth}{0pt} % Remove header line
\renewcommand{\seqinsert}{\ifmmode\allowbreak\else\-\fi} % Hyphens in seqsplit
% This two commands together give roughly
% the right line height in the tables
\renewcommand{\arraystretch}{1.3}
\onehalfspacing

% Commands
\newcommand{\SetRowColor}[1]{\noalign{\gdef\RowColorName{#1}}\rowcolor{\RowColorName}} % Shortcut for row colour
\newcommand{\mymulticolumn}[3]{\multicolumn{#1}{>{\columncolor{\RowColorName}}#2}{#3}} % For coloured multi-cols
\newcolumntype{x}[1]{>{\raggedright}p{#1}} % New column types for ragged-right paragraph columns
\newcommand{\tn}{\tabularnewline} % Required as custom column type in use

% Font and Colours
\definecolor{HeadBackground}{HTML}{333333}
\definecolor{FootBackground}{HTML}{666666}
\definecolor{TextColor}{HTML}{333333}
\definecolor{DarkBackground}{HTML}{9C99D1}
\definecolor{LightBackground}{HTML}{F2F2F9}
\renewcommand{\familydefault}{\sfdefault}
\color{TextColor}

% Header and Footer
\pagestyle{fancy}
\fancyhead{} % Set header to blank
\fancyfoot{} % Set footer to blank
\fancyhead[L]{
\noindent
\begin{multicols}{3}
\begin{tabulary}{5.8cm}{C}
    \SetRowColor{DarkBackground}
    \vspace{-7pt}
    {\parbox{\dimexpr\textwidth-2\fboxsep\relax}{\noindent
        \hspace*{-6pt}\includegraphics[width=5.8cm]{/web/www.cheatography.com/public/images/cheatography_logo.pdf}}
    }
\end{tabulary}
\columnbreak
\begin{tabulary}{11cm}{L}
    \vspace{-2pt}\large{\bf{\textcolor{DarkBackground}{\textrm{Introduction to Regression in R Cheat Sheet}}}} \\
    \normalsize{by \textcolor{DarkBackground}{Ivan Patel (patelivan)} via \textcolor{DarkBackground}{\uline{cheatography.com/135316/cs/28859/}}}
\end{tabulary}
\end{multicols}}

\fancyfoot[L]{ \footnotesize
\noindent
\begin{multicols}{3}
\begin{tabulary}{5.8cm}{LL}
  \SetRowColor{FootBackground}
  \mymulticolumn{2}{p{5.377cm}}{\bf\textcolor{white}{Cheatographer}}  \\
  \vspace{-2pt}Ivan Patel (patelivan) \\
  \uline{cheatography.com/patelivan} \\
  \end{tabulary}
\vfill
\columnbreak
\begin{tabulary}{5.8cm}{L}
  \SetRowColor{FootBackground}
  \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Cheat Sheet}}  \\
   \vspace{-2pt}Published 15th August, 2021.\\
   Updated 15th August, 2021.\\
   Page {\thepage} of \pageref{LastPage}.
\end{tabulary}
\vfill
\columnbreak
\begin{tabulary}{5.8cm}{L}
  \SetRowColor{FootBackground}
  \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Sponsor}}  \\
  \SetRowColor{white}
  \vspace{-5pt}
  %\includegraphics[width=48px,height=48px]{dave.jpeg}
  Measure your website readability!\\
  www.readability-score.com
\end{tabulary}
\end{multicols}}


\begin{document}
\raggedright
\raggedcolumns

% Set font size to small. Switch to any value
% from this page to resize cheat sheet text:
% www.emerson.emory.edu/services/latex/latex_169.html
\footnotesize % Small font.


\begin{tabularx}{17.67cm}{x{8.635 cm} x{8.635 cm} }
\SetRowColor{DarkBackground}
\mymulticolumn{2}{x{17.67cm}}{\bf\textcolor{white}{Simple Linear Regression in R}}  \tn
% Row 0
\SetRowColor{LightBackground}
Regression models allow you explore relationships between a response and explanatory variables. & You can use the model to make predictions. \tn 
% Row Count 5 (+ 5)
% Row 1
\SetRowColor{white}
It is always a good idea to visualize a dataset such as scatterplots. & The intercept is the y value when x is zero. In some cases, its interpretations might make sense. For instance, on average a house with zero convenience stores nearby had a price of 8.2242 TWD per square meter. \tn 
% Row Count 16 (+ 11)
% Row 2
\SetRowColor{LightBackground}
The slope is the amount y increases by if you increase x by one. & If your sole explanatory variable is categorical, the intercept is the response variable's mean of the omitted category. The coefficients of each category are means relative to the intercept. You can change this if you like so that the coefficients are the means of each category. \tn 
% Row Count 30 (+ 14)
\hhline{>{\arrayrulecolor{DarkBackground}}--}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{17.67cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{17.67cm}}{\bf\textcolor{white}{Simple regression code}}  \tn
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{\# Assume you have a real estate dataset and want to build a model predicting prices using n\_convenience stores nearby within walking distance. \newline  \newline \# Visualize the two variables. What is the relationship? \newline ggplot(taiwan\_real\_estate, aes(n\_convenience, price\_twd\_msq)) + \newline   geom\_point(alpha = 0.5) + \newline   geom\_smooth(method='lm', se=FALSE) \newline  \newline \# Run a linear regression of price\_twd\_msq vs. n\_convenience \newline lm(price\_twd\_msq \textasciitilde{} n\_convenience, data = taiwan\_real\_estate) \newline  \newline \# Visualize prices v age\_category using a histogram. What do you see? \newline ggplot(taiwan\_real\_estate, aes(price\_twd\_msq)) + \newline   \# Make it a histogram with 10 bins \newline   geom\_histogram(bins=10) + \newline   \# Facet the plot so each house age group gets its own panel \newline   \seqsplit{facet\_wrap(vars(house\_age\_years))} \newline  \newline \# calculate means by each age category \newline taiwan\_real\_estate \%\textgreater{}\%  \newline   \# Group by house age \newline   \seqsplit{group\_by(house\_age\_years)} \%\textgreater{}\%  \newline   \# Summarize to calculate the mean house price/area \newline   \seqsplit{summarize(mean\_by\_group} = mean(price\_twd\_msq)) \newline  \newline \# Run a linear regression of price\_twd\_msq vs. house\_age\_years \newline mdl\_price\_vs\_age \textless{}- \seqsplit{lm(data=taiwan\_real\_estate}, price\_twd\_msq\textasciitilde{}house\_age\_years) \#add +0 after house\_age\_years to get each category's mean. \newline   \newline \# See the result \newline mdl\_price\_vs\_age} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{17.67cm}{x{8.635 cm} x{8.635 cm} }
\SetRowColor{DarkBackground}
\mymulticolumn{2}{x{17.67cm}}{\bf\textcolor{white}{Predictions and Model objects}}  \tn
% Row 0
\SetRowColor{LightBackground}
Extrapolating means making predictions outside the range of observed data. Even if you use nonsense explanatory data to make predictions, the model won't throw an error and give you a prediction. Understand your data to determine if a prediction is nonsense or not. & It is useful to have values you want to use to make predictions (test data) in a tibble. Thus, you can store your predictions in the same tibble and make plots. \tn 
% Row Count 14 (+ 14)
% Row 1
\SetRowColor{white}
\seqsplit{coefficients(model\_object)} returns a named, numeric vector of the coefficients. & \seqsplit{fitted\_values(model\_object)} returns predictions on the original data. \tn 
% Row Count 18 (+ 4)
% Row 2
\SetRowColor{LightBackground}
\seqsplit{residuals(model\_object)} returns the difference between actual response values minus predictions. They are a measure of inaccuracy. & \seqsplit{broom::tidy(model\_object)} returns coefficients and its detail. \tn 
% Row Count 25 (+ 7)
% Row 3
\SetRowColor{white}
\seqsplit{broom::augment(model\_object)} returns observation level detail such as residuals, fitted values, etc. & \seqsplit{broom::glance(model\_object)} returns model-level results (performance metrics). \tn 
% Row Count 30 (+ 5)
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{17.67cm}{x{8.635 cm} x{8.635 cm} }
\SetRowColor{DarkBackground}
\mymulticolumn{2}{x{17.67cm}}{\bf\textcolor{white}{Predictions and Model objects (cont)}}  \tn
% Row 4
\SetRowColor{LightBackground}
Residuals exist due to problems in the model and fundamental randomness. And extreme cases are also often due to randomness. & Eventually, extreme cases will more look like average cases (b/c they don't presist over time). This is called regression to the mean. \tn 
% Row Count 7 (+ 7)
% Row 5
\SetRowColor{white}
Due to regression to the mean, a baseball player does not hit as many home runs this year as he did the year before. & If there is no straight line relationship between the response variable and the explanatory variable, it is sometimes possible to create one by transforming one or both of the variables. \tn 
% Row Count 17 (+ 10)
% Row 6
\SetRowColor{LightBackground}
\mymulticolumn{2}{x{17.67cm}}{If you transformed the response variable, you must "back-transform" your predictions.} \tn 
% Row Count 19 (+ 2)
\hhline{>{\arrayrulecolor{DarkBackground}}--}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{17.67cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{17.67cm}}{\bf\textcolor{white}{How to make predictions and view model objects?}}  \tn
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{\# Model prices and n\_convenience \newline mdl\_price\_vs\_conv \textless{}- lm(formula = price\_twd\_msq \textasciitilde{} n\_convenience, data = taiwan\_real\_estate) \newline  \newline \# Create a tibble of integer values from 0 to 10. \newline explanatory\_data \textless{}- tibble(n\_convenience = 0:10) \newline  \newline \# Make predictions and store them in prediction\_data \newline prediction\_data \textless{}- explanatory\_data \%\textgreater{}\%  \newline   mutate(price\_twd\_msq = \seqsplit{predict(mdl\_price\_vs\_conv}, explanatory\_data)) \newline  \newline \# Plot the predictions along with all points \newline ggplot(taiwan\_real\_estate, aes(n\_convenience, price\_twd\_msq)) + \newline   geom\_point() + \newline   geom\_smooth(method = "lm", se = FALSE) + \newline   \# Add a point layer of prediction data, colored yellow \newline   \seqsplit{geom\_point(color='yellow'}, data=prediction\_data) \newline  \newline \# -{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-- Regression to the mean example-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}- \newline  \newline \# Suppose you have data on annual returns from investing in companies in the SP500 index and you're interested in knowing if the invested performance stays the same from 2018 to 2019.  \newline  \newline \# Using sp500\_yearly\_returns, plot return\_2019 vs. return\_2018 \newline ggplot(data=sp500\_yearly\_returns, aes(x=return\_2018, y=return\_2019)) + \newline   \# Make it a scatter plot \newline   geom\_point() + \newline   \# Add a line at y = x, colored green, size 1 \newline   geom\_abline(slope=1, color='green', size=1) + \newline   \# Add a linear regression trend line, no std. error ribbon \newline   geom\_smooth(method='lm', se=FALSE) + \newline   \# Fix the coordinate ratio \newline   coord\_fixed() \newline  \newline \# Transforming variables and back-transforming the response-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}- \newline  \newline \# Assume you've facebook advertising data; how many people see the adds and how many people click on them. \newline  \newline mdl\_click\_vs\_impression \textless{}- lm( \newline   I(n\_clicks\textasciicircum{}0.25) \textasciitilde{} I(n\_impressions\textasciicircum{}0.25), \newline   data = ad\_conversion \newline ) \newline explanatory\_data \textless{}- tibble( \newline   n\_impressions = seq(0, 3e6, 5e5) \newline ) \newline prediction\_data \textless{}- explanatory\_data \%\textgreater{}\%  \newline   mutate( \newline     n\_clicks\_025 = \seqsplit{predict(mdl\_click\_vs\_impression}, explanatory\_data), \newline     n\_clicks = n\_clicks\_025 \textasciicircum{} 4 \newline   ) \newline  \newline ggplot(ad\_conversion, aes(n\_impressions \textasciicircum{} 0.25, n\_clicks \textasciicircum{} 0.25)) + \newline   geom\_point() + \newline   geom\_smooth(method = "lm", se = FALSE) + \newline   \# Add points from prediction\_data, colored green \newline   \seqsplit{geom\_point(data=prediction\_data}, color='green')} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{17.67cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{17.67cm}}{\bf\textcolor{white}{Quantifying Model Fit}}  \tn
% Row 0
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{Coefficient of determination is the proportion of variance in the response variable that is predictable from the explanatory variable. 1 means a perfect fit and 0 means the worst possible fit.} \tn 
% Row Count 4 (+ 4)
% Row 1
\SetRowColor{white}
\mymulticolumn{1}{x{17.67cm}}{For simple linear regression, coeff of determination is correlation between the response and explanatory squared.} \tn 
% Row Count 7 (+ 3)
% Row 2
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{Residual standard error (or sum of squared residuals) is a typical difference between prediction and an observed response. This is sigma in broom::glance(model)} \tn 
% Row Count 11 (+ 4)
% Row 3
\SetRowColor{white}
\mymulticolumn{1}{x{17.67cm}}{Root mean squared error (RMSE) also works but the denominator is number of observations and not degrees of freedom.} \tn 
% Row Count 14 (+ 3)
% Row 4
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{If the linear regression model is a good fit, then the residuals are normally distributed and their mean is zero. This assumption can be checked using the residual v fitted values plot. The blue trend line should closely follow the y=0 line.} \tn 
% Row Count 19 (+ 5)
% Row 5
\SetRowColor{white}
\mymulticolumn{1}{x{17.67cm}}{The Q-Q plot whether the residuals follow a normal distribution. If the points track along the diagonal line, they are normally distributed.} \tn 
% Row Count 22 (+ 3)
% Row 6
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{The scale-location plot shows whether the size of residuals get bigger or smaller as the fitted values change.} \tn 
% Row Count 25 (+ 3)
% Row 7
\SetRowColor{white}
\mymulticolumn{1}{x{17.67cm}}{Leverage quantifies how extreme your explanatory variables are. These values are stored under .hat in augment().} \tn 
% Row Count 28 (+ 3)
% Row 8
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{Influence measures how much model the model would change if you left the observation out of the dataset when modeling. Contained in .cooksd column in augment().} \tn 
% Row Count 32 (+ 4)
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{17.67cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{17.67cm}}{\bf\textcolor{white}{Code to quantify model's fit.}}  \tn
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{\# Plot three diagnostics for mdl\_price\_vs\_conv \newline library(ggplot2) \newline library(ggfortify) \newline autoplot(mdl\_price\_vs\_conv, which=1:3, nrow=3, ncol=1) \newline  \newline \# Plot the three outlier diagnostics for mdl\_price\_vs\_conv \newline autoplot(mdl\_price\_vs\_dist, which=4:6, nrow=3, ncol=1)} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{17.67cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{17.67cm}}{\bf\textcolor{white}{Simple Logistic regression in R}}  \tn
% Row 0
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{Build this model when the response is binary. Predictions are probabilities and not amounts.} \tn 
% Row Count 2 (+ 2)
% Row 1
\SetRowColor{white}
\mymulticolumn{1}{x{17.67cm}}{The responses follow a logistic (s-shaped) curve. You can have the model return probabilties by specifying the response type in predict().} \tn 
% Row Count 5 (+ 3)
% Row 2
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{Odds ratio is the proability of something happening divided by the probability that it doesn't.} \tn 
% Row Count 7 (+ 2)
% Row 3
\SetRowColor{white}
\mymulticolumn{1}{x{17.67cm}}{This is sometimes easier to reason about than probabilities, particularly when you want to make decisions about choices. For example, if a customer has a 20\% chance of churning, it maybe more intuitive to say "the chance of them not churning is four times higher than the chance of them churning".} \tn 
% Row Count 13 (+ 6)
% Row 4
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{One downside to probabilities and odds ratios for logistic regression predictions is that the prediction lines for each are curved.} \tn 
% Row Count 16 (+ 3)
% Row 5
\SetRowColor{white}
\mymulticolumn{1}{x{17.67cm}}{A nice property of logistic regression odds ratio is that on a log-scale they change linearly with the explanatory variable.} \tn 
% Row Count 19 (+ 3)
% Row 6
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{This makes it harder to reason about what happens to the prediction when you make a change to the explanatory variable. The logarithm of the odds ratio (the "log odds ratio") does have a linear relationship between predicted response and explanatory variable.} \tn 
% Row Count 25 (+ 6)
% Row 7
\SetRowColor{white}
\mymulticolumn{1}{x{17.67cm}}{We use confusion matricies to quantify the fit of logistic regression.} \tn 
% Row Count 27 (+ 2)
% Row 8
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{Accuracy is the proportion of correct predictions.} \tn 
% Row Count 28 (+ 1)
% Row 9
\SetRowColor{white}
\mymulticolumn{1}{x{17.67cm}}{Sensitivity is the proportion of true positives. TP/(FN+TP). Proportion of observations where the actual response was true where the model also predicted were true.} \tn 
% Row Count 32 (+ 4)
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{17.67cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{17.67cm}}{\bf\textcolor{white}{Simple Logistic regression in R (cont)}}  \tn
% Row 10
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{Specificty is the proportion of true negatives. TN/(TN+FP). Proportion of observations where the actual response was false where the model also predicted that they were false.} \tn 
% Row Count 4 (+ 4)
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{17.67cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{17.67cm}}{\bf\textcolor{white}{Code for Logistic regression in R}}  \tn
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{\seqsplit{plt\_churn\_vs\_relationship} ggplot(churn, \seqsplit{aes(time\_since\_first\_purchase}, has\_churned)) + \newline   geom\_point() + \newline   geom\_smooth(method = "lm", se = FALSE, color = "red") + \newline   \# Add a glm trend line, no std error ribbon, binomial family \newline   \seqsplit{geom\_smooth(method='glm'}, se=FALSE, \seqsplit{method.args=list(family=binomial))} \newline  \newline \# Fit a logistic regression of churn vs.  \newline \# length of relationship using the churn dataset \newline mdl\_churn\_vs\_relationship \textless{}- glm(has\_churned \textasciitilde{} \seqsplit{time\_since\_first\_purchase}, data=churn, family='binomial') \newline  \newline \# See the result \newline mdl\_churn\_vs\_relationship \newline  \newline \# Make predictions. "response" type returns probabilities of churning.  \newline prediction\_data \textless{}- explanatory\_data \%\textgreater{}\%  \newline   mutate(    \newline     has\_churned = \seqsplit{predict(mdl\_churn\_vs\_relationship}, explanatory\_data, type = "response"), \newline     most\_likely\_outcome = round(has\_churned) \# easier to interpret.  \newline   ) \newline  \newline \# Update the plot \newline plt\_churn\_vs\_relationship + \newline   \# Add most likely outcome points from prediction\_data, colored yellow, size 2 \newline   \seqsplit{geom\_point(data=prediction\_data}, size=2, color='yellow', \seqsplit{aes(y=most\_likely\_outcome))} \newline  \newline \# Odds ratio-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}-{}- \newline  \newline \# From previous step \newline prediction\_data \textless{}- explanatory\_data \%\textgreater{}\%  \newline   mutate(    \newline     has\_churned = \seqsplit{predict(mdl\_churn\_vs\_relationship}, explanatory\_data, type = "response"), \newline     odds\_ratio = has\_churned / (1 - has\_churned),  \newline     log\_odds\_ratio = log(odds\_ratio) \newline   ) \newline  \newline \# Using prediction\_data, plot odds\_ratio vs. \seqsplit{time\_since\_first\_purchase} \newline ggplot(data=prediction\_data, \seqsplit{aes(x=time\_since\_first\_purchase}, y=odds\_ratio)) + \newline   \# Make it a line plot \newline   geom\_line() + \newline   \# Add a dotted horizontal line at y = 1. Indicates where churning is just as likely as not churning.  \newline   geom\_hline(yintercept=1, linetype='dotted')} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}

\begin{tabularx}{17.67cm}{X}
\SetRowColor{DarkBackground}
\mymulticolumn{1}{x{17.67cm}}{\bf\textcolor{white}{Quantifying Logistic's Model Fit}}  \tn
\SetRowColor{LightBackground}
\mymulticolumn{1}{x{17.67cm}}{\# Get the confusion matrix. \newline library(yardstick) \newline  \newline \# Get the actual and most likely responses from the dataset \newline actual\_response \textless{}- churn\$has\_churned \newline predicted\_response \textless{}- \seqsplit{round(fitted(mdl\_churn\_vs\_relationship))} \newline  \newline \# Create a table of counts \newline outcomes \textless{}- \seqsplit{table(predicted\_response}, actual\_response) \newline  \newline \# Convert outcomes to a yardstick confusion matrix \newline confusion \textless{}- conf\_mat(outcomes) \newline  \newline \# Get performance metrics for the confusion matrix \newline summary(confusion, event\_level = 'second')} \tn 
\hhline{>{\arrayrulecolor{DarkBackground}}-}
\end{tabularx}
\par\addvspace{1.3em}


\end{document}