\documentclass[10pt,a4paper]{article} % Packages \usepackage{fancyhdr} % For header and footer \usepackage{multicol} % Allows multicols in tables \usepackage{tabularx} % Intelligent column widths \usepackage{tabulary} % Used in header and footer \usepackage{hhline} % Border under tables \usepackage{graphicx} % For images \usepackage{xcolor} % For hex colours %\usepackage[utf8x]{inputenc} % For unicode character support \usepackage[T1]{fontenc} % Without this we get weird character replacements \usepackage{colortbl} % For coloured tables \usepackage{setspace} % For line height \usepackage{lastpage} % Needed for total page number \usepackage{seqsplit} % Splits long words. %\usepackage{opensans} % Can't make this work so far. Shame. Would be lovely. \usepackage[normalem]{ulem} % For underlining links % Most of the following are not required for the majority % of cheat sheets but are needed for some symbol support. \usepackage{amsmath} % Symbols \usepackage{MnSymbol} % Symbols \usepackage{wasysym} % Symbols %\usepackage[english,german,french,spanish,italian]{babel} % Languages % Document Info \author{Remidy08} \pdfinfo{ /Title (hands-on-machine-learning.pdf) /Creator (Cheatography) /Author (Remidy08) /Subject (Hands-On Machine Learning Cheat Sheet) } % Lengths and widths \addtolength{\textwidth}{6cm} \addtolength{\textheight}{-1cm} \addtolength{\hoffset}{-3cm} \addtolength{\voffset}{-2cm} \setlength{\tabcolsep}{0.2cm} % Space between columns \setlength{\headsep}{-12pt} % Reduce space between header and content \setlength{\headheight}{85pt} % If less, LaTeX automatically increases it \renewcommand{\footrulewidth}{0pt} % Remove footer line \renewcommand{\headrulewidth}{0pt} % Remove header line \renewcommand{\seqinsert}{\ifmmode\allowbreak\else\-\fi} % Hyphens in seqsplit % This two commands together give roughly % the right line height in the tables \renewcommand{\arraystretch}{1.3} \onehalfspacing % Commands \newcommand{\SetRowColor}[1]{\noalign{\gdef\RowColorName{#1}}\rowcolor{\RowColorName}} % Shortcut for row colour \newcommand{\mymulticolumn}[3]{\multicolumn{#1}{>{\columncolor{\RowColorName}}#2}{#3}} % For coloured multi-cols \newcolumntype{x}[1]{>{\raggedright}p{#1}} % New column types for ragged-right paragraph columns \newcommand{\tn}{\tabularnewline} % Required as custom column type in use % Font and Colours \definecolor{HeadBackground}{HTML}{333333} \definecolor{FootBackground}{HTML}{666666} \definecolor{TextColor}{HTML}{333333} \definecolor{DarkBackground}{HTML}{A32410} \definecolor{LightBackground}{HTML}{F9F1F0} \renewcommand{\familydefault}{\sfdefault} \color{TextColor} % Header and Footer \pagestyle{fancy} \fancyhead{} % Set header to blank \fancyfoot{} % Set footer to blank \fancyhead[L]{ \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{C} \SetRowColor{DarkBackground} \vspace{-7pt} {\parbox{\dimexpr\textwidth-2\fboxsep\relax}{\noindent \hspace*{-6pt}\includegraphics[width=5.8cm]{/web/www.cheatography.com/public/images/cheatography_logo.pdf}} } \end{tabulary} \columnbreak \begin{tabulary}{11cm}{L} \vspace{-2pt}\large{\bf{\textcolor{DarkBackground}{\textrm{Hands-On Machine Learning Cheat Sheet}}}} \\ \normalsize{by \textcolor{DarkBackground}{Remidy08} via \textcolor{DarkBackground}{\uline{cheatography.com/159206/cs/34123/}}} \end{tabulary} \end{multicols}} \fancyfoot[L]{ \footnotesize \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{LL} \SetRowColor{FootBackground} \mymulticolumn{2}{p{5.377cm}}{\bf\textcolor{white}{Cheatographer}} \\ \vspace{-2pt}Remidy08 \\ \uline{cheatography.com/remidy08} \\ \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Cheat Sheet}} \\ \vspace{-2pt}Not Yet Published.\\ Updated 15th September, 2022.\\ Page {\thepage} of \pageref{LastPage}. \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Sponsor}} \\ \SetRowColor{white} \vspace{-5pt} %\includegraphics[width=48px,height=48px]{dave.jpeg} Measure your website readability!\\ www.readability-score.com \end{tabulary} \end{multicols}} \begin{document} \raggedright \raggedcolumns % Set font size to small. Switch to any value % from this page to resize cheat sheet text: % www.emerson.emory.edu/services/latex/latex_169.html \footnotesize % Small font. \begin{multicols*}{3} \begin{tabularx}{5.377cm}{X} \SetRowColor{DarkBackground} \mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Tips}} \tn \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{Even though the RMSE is generally the preferred performance measure for regression tasks, in some contexts you may prefer to use another function. For example, suppose that there are many outlier districts. In that case, you may consider using the Mean Absolute Error. \newline % Row Count 6 (+ 6) Computing the root of a sum of squares (RMSE) corresponds to the Euclidian norm: it is the notion of distance you are familiar with. It is also called the ℓ2 norm, noted ∥ · ∥2 (or just ∥ · ∥).% Row Count 11 (+ 5) } \tn \hhline{>{\arrayrulecolor{DarkBackground}}-} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{x{2.4885 cm} x{2.4885 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{Handling Text and Categorical Attributes}} \tn % Row 0 \SetRowColor{LightBackground} Converts classes into numbers & from \seqsplit{sklearn.preprocessing} import LabelEncoder \tn % Row Count 3 (+ 3) % Row 1 \SetRowColor{white} & encoder = LabelEncoder() \tn % Row Count 5 (+ 2) % Row 2 \SetRowColor{LightBackground} & \seqsplit{housing\_cat\_encoded} = \seqsplit{encoder.fit\_transform(columns} with categories) \tn % Row Count 9 (+ 4) % Row 3 \SetRowColor{white} Turns an a categorical atribute into a sparse matrix where each column is a class and each row an observation & from \seqsplit{sklearn.preprocessing} import OneHotEncoder \tn % Row Count 15 (+ 6) % Row 4 \SetRowColor{LightBackground} & encoder = OneHotEncoder() \tn % Row Count 17 (+ 2) % Row 5 \SetRowColor{white} & housing\_cat\_1hot = \seqsplit{encoder.fit\_transform(housing\_cat\_encoded.reshape(-1},1)) \tn % Row Count 21 (+ 4) \hhline{>{\arrayrulecolor{DarkBackground}}--} \SetRowColor{LightBackground} \mymulticolumn{2}{x{5.377cm}}{One issue with this representation is that ML algorithms will assume that two nearby values are more similar than two distant values.} \tn \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{x{2.43873 cm} x{2.53827 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{Visualizing data}} \tn % Row 0 \SetRowColor{LightBackground} Scatter plot & \seqsplit{data.plot(kind="scatter"}, x="longitude", y="latitude", aplha=0.1 (makes the points transparent, thus allowing the visualization of high density places), s=column (determines size of the points), \seqsplit{cmap=plt.get\_cmap("jet")} (color scheme), colorbar=True (makes a color bar appear), label='pop' (label of the points),c='column'(which column the circles will base its collor off)) \tn % Row Count 19 (+ 19) % Row 1 \SetRowColor{white} places a legend on the axis & plt.legend() \tn % Row Count 21 (+ 2) % Row 2 \SetRowColor{LightBackground} Plot with histograms and scatter plots & from \seqsplit{pandas.tools.plotting} import scatter\_matrix \tn % Row Count 24 (+ 3) % Row 3 \SetRowColor{white} & scatter\_matrix(housing{[}list of columns{]}, figsize=(12, 8)) \tn % Row Count 27 (+ 3) \hhline{>{\arrayrulecolor{DarkBackground}}--} \SetRowColor{LightBackground} \mymulticolumn{2}{x{5.377cm}}{some attributes have a tail-heavy distribution, so you may want to transform them (e.g., by computing their logarithm)} \tn \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{x{2.38896 cm} x{2.58804 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{Feature Scaling}} \tn % Row 0 \SetRowColor{LightBackground} from sklearn.pipeline import Pipeline & takes a list of name/estimator pairs defining a sequence of steps. All but the last estimator must be transformers \tn % Row Count 6 (+ 6) % Row 1 \SetRowColor{white} \mymulticolumn{2}{x{5.377cm}}{StandardScaler()} \tn % Row Count 7 (+ 1) \hhline{>{\arrayrulecolor{DarkBackground}}--} \SetRowColor{LightBackground} \mymulticolumn{2}{x{5.377cm}}{Machine Learning algorithms don't perform well when \newline the input numerical attributes have very different scales} \tn \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{p{0.4977 cm} p{0.4977 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{Training and Evaluating on the Training Set}} \tn % Row 0 \SetRowColor{LightBackground} \mymulticolumn{2}{x{5.377cm}}{} \tn % Row Count 0 (+ 0) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{x{3.08574 cm} x{1.89126 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{Correlations}} \tn % Row 0 \SetRowColor{LightBackground} correlation matrix & data.corr() \tn % Row Count 1 (+ 1) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{x{2.4885 cm} x{2.4885 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{Data cleaning}} \tn % Row 0 \SetRowColor{LightBackground} Drops rows with NA values & housing.dropna(subset={[}"total\_bedrooms"{]}) \tn % Row Count 3 (+ 3) % Row 1 \SetRowColor{white} DReturn the data set without a column or row (in this case it is a column) & \seqsplit{housing.drop("total\_bedrooms"}, axis=1) \tn % Row Count 7 (+ 4) % Row 2 \SetRowColor{LightBackground} fills NA values with the corresponding values & housing{[}"total\_bedrooms"{]}.fillna(value) \tn % Row Count 10 (+ 3) % Row 3 \SetRowColor{white} Imputer & from sklearn.impute import SimpleImputer \tn % Row Count 12 (+ 2) % Row 4 \SetRowColor{LightBackground} Replace missing values using a descriptive statistic (e.g. mean, median, or most frequent) along each column, or using a constant value & imputer = \seqsplit{SimpleImputer(strategy="median")} \tn % Row Count 19 (+ 7) % Row 5 \SetRowColor{white} The imputer has simply computed the median of each attribute and stored the result in its statistics\_ instance variable. & imputer.fit(data) \tn % Row Count 25 (+ 6) % Row 6 \SetRowColor{LightBackground} Returns values that we computed & imputer.statistics\_ \tn % Row Count 27 (+ 2) % Row 7 \SetRowColor{white} Transform the missing value into corresponding value (return numpy array) & X = \seqsplit{imputer.transform(housing\_num)} \tn % Row Count 31 (+ 4) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{5.377cm}{x{2.4885 cm} x{2.4885 cm} } \SetRowColor{DarkBackground} \mymulticolumn{2}{x{5.377cm}}{\bf\textcolor{white}{Data cleaning (cont)}} \tn % Row 8 \SetRowColor{LightBackground} Transform it back to a data frame & housing\_tr = pd.DataFrame(X, \seqsplit{columns=housing\_num}.columns) \tn % Row Count 3 (+ 3) \hhline{>{\arrayrulecolor{DarkBackground}}--} \end{tabularx} \par\addvspace{1.3em} % That's all folks \end{multicols*} \end{document}