\documentclass[10pt,a4paper]{article} % Packages \usepackage{fancyhdr} % For header and footer \usepackage{multicol} % Allows multicols in tables \usepackage{tabularx} % Intelligent column widths \usepackage{tabulary} % Used in header and footer \usepackage{hhline} % Border under tables \usepackage{graphicx} % For images \usepackage{xcolor} % For hex colours %\usepackage[utf8x]{inputenc} % For unicode character support \usepackage[T1]{fontenc} % Without this we get weird character replacements \usepackage{colortbl} % For coloured tables \usepackage{setspace} % For line height \usepackage{lastpage} % Needed for total page number \usepackage{seqsplit} % Splits long words. %\usepackage{opensans} % Can't make this work so far. Shame. Would be lovely. \usepackage[normalem]{ulem} % For underlining links % Most of the following are not required for the majority % of cheat sheets but are needed for some symbol support. \usepackage{amsmath} % Symbols \usepackage{MnSymbol} % Symbols \usepackage{wasysym} % Symbols %\usepackage[english,german,french,spanish,italian]{babel} % Languages % Document Info \author{julenx} \pdfinfo{ /Title (r-t2-3-analisis-de-la-correlacion.pdf) /Creator (Cheatography) /Author (julenx) /Subject (R T2.3 Análisis de la correlación Cheat Sheet) } % Lengths and widths \addtolength{\textwidth}{6cm} \addtolength{\textheight}{-1cm} \addtolength{\hoffset}{-3cm} \addtolength{\voffset}{-2cm} \setlength{\tabcolsep}{0.2cm} % Space between columns \setlength{\headsep}{-12pt} % Reduce space between header and content \setlength{\headheight}{85pt} % If less, LaTeX automatically increases it \renewcommand{\footrulewidth}{0pt} % Remove footer line \renewcommand{\headrulewidth}{0pt} % Remove header line \renewcommand{\seqinsert}{\ifmmode\allowbreak\else\-\fi} % Hyphens in seqsplit % This two commands together give roughly % the right line height in the tables \renewcommand{\arraystretch}{1.3} \onehalfspacing % Commands \newcommand{\SetRowColor}[1]{\noalign{\gdef\RowColorName{#1}}\rowcolor{\RowColorName}} % Shortcut for row colour \newcommand{\mymulticolumn}[3]{\multicolumn{#1}{>{\columncolor{\RowColorName}}#2}{#3}} % For coloured multi-cols \newcolumntype{x}[1]{>{\raggedright}p{#1}} % New column types for ragged-right paragraph columns \newcommand{\tn}{\tabularnewline} % Required as custom column type in use % Font and Colours \definecolor{HeadBackground}{HTML}{333333} \definecolor{FootBackground}{HTML}{666666} \definecolor{TextColor}{HTML}{333333} \definecolor{DarkBackground}{HTML}{A3A3A3} \definecolor{LightBackground}{HTML}{F3F3F3} \renewcommand{\familydefault}{\sfdefault} \color{TextColor} % Header and Footer \pagestyle{fancy} \fancyhead{} % Set header to blank \fancyfoot{} % Set footer to blank \fancyhead[L]{ \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{C} \SetRowColor{DarkBackground} \vspace{-7pt} {\parbox{\dimexpr\textwidth-2\fboxsep\relax}{\noindent \hspace*{-6pt}\includegraphics[width=5.8cm]{/web/www.cheatography.com/public/images/cheatography_logo.pdf}} } \end{tabulary} \columnbreak \begin{tabulary}{11cm}{L} \vspace{-2pt}\large{\bf{\textcolor{DarkBackground}{\textrm{R T2.3 Análisis de la correlación Cheat Sheet}}}} \\ \normalsize{by \textcolor{DarkBackground}{julenx} via \textcolor{DarkBackground}{\uline{cheatography.com/168626/cs/35688/}}} \end{tabulary} \end{multicols}} \fancyfoot[L]{ \footnotesize \noindent \begin{multicols}{3} \begin{tabulary}{5.8cm}{LL} \SetRowColor{FootBackground} \mymulticolumn{2}{p{5.377cm}}{\bf\textcolor{white}{Cheatographer}} \\ \vspace{-2pt}julenx \\ \uline{cheatography.com/julenx} \\ \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Cheat Sheet}} \\ \vspace{-2pt}Published 29th November, 2022.\\ Updated 25th November, 2022.\\ Page {\thepage} of \pageref{LastPage}. \end{tabulary} \vfill \columnbreak \begin{tabulary}{5.8cm}{L} \SetRowColor{FootBackground} \mymulticolumn{1}{p{5.377cm}}{\bf\textcolor{white}{Sponsor}} \\ \SetRowColor{white} \vspace{-5pt} %\includegraphics[width=48px,height=48px]{dave.jpeg} Measure your website readability!\\ www.readability-score.com \end{tabulary} \end{multicols}} \begin{document} \raggedright \raggedcolumns % Set font size to small. Switch to any value % from this page to resize cheat sheet text: % www.emerson.emory.edu/services/latex/latex_169.html \footnotesize % Small font. \begin{multicols*}{3} \begin{tabularx}{5.377cm}{X} \SetRowColor{DarkBackground} \mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Coeficiente de correlación de Pearson}} \tn % Row 0 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{Debe satisfacer las siguientes condiciones} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}La relación que se quiere estudiar entre ambas variables es lineal \{\{nl\}\} Las dos variables deben de ser cuantitativas \{\{nl\}\} Normalidad: ambas variables se tienen que distribuir de forma normal (test K-S) \{\{nl\}\} Homocedasticidad: La varianza de 𝑌 debe ser constante a lo largo de la variable 𝑋 (test de Bartlett)} \tn % Row Count 8 (+ 8) % Row 1 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{Características} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}Toma valores entre {[}-1, +1{]}, siendo +1 una correlación lineal positiva perfecta y -1 una correlación lineal negativa perfecta. Si es 0 significa que NO existe relación lineal entre las variables consideradas. \{\{nl\}\} No varía si se aplican transformaciones a las variables \{\{nl\}\} no equivale a la pendiente de la recta de regresión \{\{nl\}\} es necesario calcular su significatividad (p-valor) si no es significativo, se ha de interpretar que la correlación de ambas variables es 0} \tn % Row Count 20 (+ 12) % Row 2 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{{\bf{1.1. Evidencias gráficas: Scatterplot}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}library(ggplot2) \{\{nl\}\} ggplot(data = Cars93, \{\{nl\}\} ~ aes(x = Weight, y = Horsepower)) + \{\{nl\}\} ~ geom\_point(colour = "red4") + \{\{nl\}\} ~ ggtitle("Diagrama de dispersión") + \{\{nl\}\} ~ theme\_bw() + \{\{nl\}\} ~ theme(plot.title = element\_text(hjust = 0.5))} \tn % Row Count 27 (+ 7) % Row 3 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{{\bf{1.2. Evidencias gráficas: Análisis de normalidad}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}par(mfrow = c(1, 2)) \{\{nl\}\} hist(Cars93\$Weight, breaks = 10, main = "", \{\{nl\}\} ~ xlab = "Weight", border = "darkred") \{\{nl\}\} hist(Cars93\$Horsepower, breaks = 10, main = "", \{\{nl\}\} ~ xlab = "Horsepower", border = "blue") \{\{nl\}\} par(mfrow = c(1, 1))} \tn % Row Count 35 (+ 8) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{5.377cm}{X} \SetRowColor{DarkBackground} \mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Coeficiente de correlación de Pearson (cont)}} \tn % Row 4 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{{\bf{1.3. Evidencias gráficas: gráfico quantil-quantil (Q-Q plot)}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}qqnorm(Cars93\$Weight, main = "Weight", \{\{nl\}\} ~ col = "darkred") \{\{nl\}\} qqline(Cars93\$Weight) \{\{nl\}\} \seqsplit{qqnorm(Cars93\$Horsepower}, main = "Horsepower", \{\{nl\}\} ~ col = "blue") \{\{nl\}\} \seqsplit{qqline(Cars93\$Horsepower)}} \tn % Row Count 7 (+ 7) % Row 5 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}par(mfrow = c(1, 2)) \{\{nl\}\} qqnorm(Cars93\$Weight, main = "Weight", \{\{nl\}\} ~ col = "darkred") \{\{nl\}\} qqline(Cars93\$Weight) \{\{nl\}\} \seqsplit{qqnorm(Cars93\$Horsepower}, main = "Horsepower", \{\{nl\}\} ~ col = "blue") \{\{nl\}\} \seqsplit{qqline(Cars93\$Horsepower)} \{\{nl\}\} par(mfrow = c(1, 1))} \tn % Row Count 13 (+ 6) % Row 6 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{{\bf{2.1 Evid. contr.: test K-S-L}} (H0= dist. normal)} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}require(nortest) \{\{nl\}\} \seqsplit{lillie.test(Cars93\$Weight)} \{\{nl\}\} \seqsplit{lillie.test(Cars93\$Horsepower)}} \tn % Row Count 17 (+ 4) % Row 7 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{{\bf{2.2 Si no es dist. normal: tipificación}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}a cada observación se resta la media y se divide por la desviación típica. \{\{nl\}\} xt = scale(x)} \tn % Row Count 21 (+ 4) % Row 8 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{{\bf{2.3 Si no es dist. normal: transformación no lineal}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}Asimetría negativa: 𝑥2 \{\{nl\}\} Asimetría positiva: √𝑥 (poco) 𝑙𝑛(𝑥) (medio) 1/𝑥 (alto) \{\{nl\}\} Ahora se repiten todos los pasos a partir de evidencias gráficas: normalidad} \tn % Row Count 28 (+ 7) % Row 9 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{{\bf{3.1. Análisis de la homocedasticidad: Gráficas}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}ggplot(data = Cars93, aes(x = Weight, \{\{nl\}\} ~ y = Horsepower)) + geom\_point(colour = "red4") + \{\{nl\}\} ~ geom\_segment(aes(x = 1690, y = 70, xend = 3100, \{\{nl\}\} ~ yend = 300), linetype = "dashed") + \{\{nl\}\} ~ geom\_segment(aes(x = 1690, y = 45, xend = 4100, \{\{nl\}\} ~ yend = 100),linetype = "dashed") + \{\{nl\}\} ~ ggtitle("Diagrama de dispersión") + theme\_bw() + \{\{nl\}\} ~ theme(plot.title = element\_text(hjust = 0.5))} \tn % Row Count 40 (+ 12) \end{tabularx} \par\addvspace{1.3em} \vfill \columnbreak \begin{tabularx}{5.377cm}{X} \SetRowColor{DarkBackground} \mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Coeficiente de correlación de Pearson (cont)}} \tn % Row 10 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{{\bf{3.2. Análisis de la homocedasticidad: ev. cont. test de Bartlett}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}El p-valor menor que 5\% permite rechazar la hipótesis nula 𝐻0 \{\{nl\}\}bartlett.test(list(Cars93\$Weight,Cars93\$Horsepower)) \{\{nl\}\} Test de Bartlett en las variables transformadas \{\{nl\}\} \seqsplit{bartlett.test(list(log(Cars93\$Weight)},log(Cars93\$Horsepower)))} \tn % Row Count 8 (+ 8) % Row 11 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{{\bf{4.1. Coeficiente de Pearson (ev. num. solo)}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}cor(x = Cars93\$Weight, y = log(Cars93\$Horsepower), method = "pearson")} \tn % Row Count 11 (+ 3) % Row 12 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{{\bf{4.2. Coeficiente de Pearson (ev. contrastada)}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}Comrpobamos que p-valor sea menor que o igual que 0,05 \{\{nl\}\} cor.test(x = Cars93\$Weight, y = log(Cars93\$Horsepower),alternative = "two.sided",conf.level = 0.95, method = "pearson")} \tn % Row Count 16 (+ 5) % Row 13 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{{\bf{5. Coeficiente de determinación 𝑅2}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}cantidad de varianza de la variable 𝑌 explicada por 𝑋 y que se obtiene elevando al cuadrado el coeficiente de correlación 𝑟 \{\{nl\}\} El 𝑅2 presenta valores entre 0 y 1. Lo más próximo a 1 significará que con nuestra variable X seremos capaces de explicar en gran medida el comportamiento de nuestra variable Y. En cambio, si se aproxima a 0 querrá decir que si nuestro objetivo es explicar la variable Y en función de X, tendremos que cambiar de variable explicativa (X) ya que prácticamente no nos aporta información sobre la variable Y.} \tn % Row Count 29 (+ 13) \hhline{>{\arrayrulecolor{DarkBackground}}-} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{X} \SetRowColor{DarkBackground} \mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{Si no se cumplen condiciones del coef. Pearson}} \tn % Row 0 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{{\bf{Coeficiente Rho de Spearman}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}Cuando estamos ante variables categóricas (no numéricas) (género y educación) \{\{nl\}\} Cuando los valores no pertenecen a una distribución normal} \tn % Row Count 5 (+ 5) % Row 1 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{require(MASS)} \tn % Row Count 6 (+ 1) % Row 2 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{cor.test(birthwt\$bwt, birthwt\$lwt, alternative="two.sided", method="spearman")} \tn % Row Count 8 (+ 2) % Row 3 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{cor.test(birthwt\$bwt, birthwt\$lwt, alternative="less", method="spearman")} \tn % Row Count 10 (+ 2) % Row 4 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{cor.test(birthwt\$bwt, birthwt\$lwt, alternative="greater", method="spearman")} \tn % Row Count 12 (+ 2) % Row 5 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{{\bf{Coeficiente Tau de Kendall}}} \tn \mymulticolumn{1}{x{5.377cm}}{\hspace*{6 px}\rule{2px}{6px}\hspace*{6 px}Cuando estamos ante variables categóricas (no numéricas) (género y educación) \{\{nl\}\} Cuando los valores no pertenecen a una distribución normal} \tn % Row Count 17 (+ 5) % Row 6 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{require(MASS)} \tn % Row Count 18 (+ 1) % Row 7 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{cor.test(birthwt\$bwt, birthwt\$lwt, alternative="two.sided", method="kendall")} \tn % Row Count 20 (+ 2) % Row 8 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{etc} \tn % Row Count 21 (+ 1) \hhline{>{\arrayrulecolor{DarkBackground}}-} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{X} \SetRowColor{DarkBackground} \mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{La matriz de correlaciones}} \tn % Row 0 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{cor(x = datos, method = "pearson")} \tn % Row Count 1 (+ 1) % Row 1 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{pairs(x = datos, lower.panel = NULL)} \tn % Row Count 2 (+ 1) % Row 2 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{require(corrplot)} \tn % Row Count 3 (+ 1) % Row 3 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{corrplot(corr = cor(x = datos, method = "pearson"), method = "number")} \tn % Row Count 5 (+ 2) % Row 4 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{require(psych)} \tn % Row Count 6 (+ 1) % Row 5 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{pairs.panels(x = datos, ellipses = FALSE, lm = TRUE, method = "pearson")} \tn % Row Count 8 (+ 2) \hhline{>{\arrayrulecolor{DarkBackground}}-} \end{tabularx} \par\addvspace{1.3em} \begin{tabularx}{5.377cm}{X} \SetRowColor{DarkBackground} \mymulticolumn{1}{x{5.377cm}}{\bf\textcolor{white}{El coeficiente de correlación parcial}} \tn % Row 0 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{Se quiere estudiar la relación entre las variables precio y peso de los automóviles. Se sospecha que esta relación podría estar influenciada por la variable potencia del motor, ya que a mayor peso del vehículo se requiere mayor potencia y, a su vez, motores más potentes son más caros.} \tn % Row Count 6 (+ 6) % Row 1 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{ggplot(data = Cars93, aes(x = Weight, y = log(Price))) + \{\{nl\}\} geom\_point(colour = "red4") + \{\{nl\}\} ggtitle("Diagrama de dispersión") + theme\_bw() + \{\{nl\}\} theme(plot.title = element\_text(hjust = 0.5))} \tn % Row Count 11 (+ 5) % Row 2 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{cor.test(x = Cars93\$Weight, y = log(Cars93\$Price), method = "pearson")} \tn % Row Count 13 (+ 2) % Row 3 \SetRowColor{white} \mymulticolumn{1}{x{5.377cm}}{require(ppcor) \{\{nl\}\} pcor.test(x = Cars93\$Weight, y = log(Cars93\$Price), \{\{nl\}\} ~ z = Cars93\$Horsepower, method = "pearson")} \tn % Row Count 16 (+ 3) % Row 4 \SetRowColor{LightBackground} \mymulticolumn{1}{x{5.377cm}}{La correlación entre el peso y el logaritmo del precio es alta (r=0.764) y significativa (p-value \textless{} 2.2e-16). Sin embargo, cuando se estudia su relación bloqueando la variable potencia de motor, a pesar de que la relación sigue siendo significativa (p-value = 6.288649e-05) pasa a ser baja (r=0.4047). \{\{nl\}\} Luego, podemos concluir que la relación lineal existente entre el peso y el logaritmo del precio está influenciada por el efecto de la variable potencia de motor. Si se controla el efecto de la potencia, la relación lineal existente es baja (r=0.4047).} \tn % Row Count 28 (+ 12) \hhline{>{\arrayrulecolor{DarkBackground}}-} \end{tabularx} \par\addvspace{1.3em} % That's all folks \end{multicols*} \end{document}