Cheatography

# R Cheat Sheet Cheat Sheet by non_human_entity

A brief cheat sheet for students of PSYCH 413.

### Operators

 ``=`` ``<-`` Assigns a value to an object `` > `` greater than ``<`` less than ``>=`` greater than or equal to ``<=`` less than or equal to ``==`` exactly equal to ``!=`` not equal to ``!x`` not x ``x | y`` x OR y ``x & y`` x AND y ``%>%`` Sends something (e.g., dataframe or function output) to a tidyverse function. i.e., it is an elegant way to nest tidyverse functions. Example:``library(tidyverse)msleep %>%  ­fil­ter­(co­nse­rvation == 'domes­tic­ated') %>%    ­sum­mar­ise(m = mean(b­rainwt, na.rm = TRUE),              s = sd(bra­inwt, na.rm = TRUE),              med = median­(br­ainwt, na.rm = TRUE)              )`` Note that ``summar­ise()`` is nested within ``filter()``, which is using info from the ``msleep`` dataframe.

### Basic R Functions

 Access a function's help file ``?[function name]`` Load a csv file ``read.csv( "­­fi­l­e­na­­me.c­­s­v­", header = TRUE )`` Install a library ``insta­­ll.p­­a­c­k­ag­­es(­­"­l­ibrary name")`` Load an installed library ``libra­­ry(­­li­brary name)`` Resize images in Google Collab ``optio­­ns(­­re­p­r.p­­lot.width = x, repr.p­­lo­t.h­eight = y)`` Return the amount of values in x ``lengt­h(x)`` Return the number of rows in a dataframe ``nrow(d­ata­frame)`` Return the absolute value(s) in x ``abs(x)`` Return the sum of all the values in x ``sum(x)`` Return the square­­-root of the value(s) in x ``sqrt(x)`` Return the mean of the values in x with optional arguments for trimming and removing NAs ``mean(x, tr = 0, na.rm = FALSE)`` Return the median of the values in x with optional arguments removing NAs ``median(x, na.rm = FALSE)`` Return the sample standard deviation of values in x with optional argument for removing NAs ``sd(x, na.rm = FALSE)`` Return the sample variance of values in x with optional argument for removing NAs ``var(x, na.rm = FALSE)`` Return the quartiles for x with optional argument for removing NAs ``quanti­le(x, na.rm = FALSE)`` Sort the values of x into ascending order ``sort(x)`` Compute the median absolute deviation of x with optional argument to remove NAs ``mad(x, na.rm = FALSE)`` Find NA values in x (returns TRUE/F­ALSE) ``is.na(x)`` Paste things together into a single string ``paste(x, y, z, sep = "­­")`` Create a table of counts Examples: ``table(x)`` ``table(x, y)``

### Data Frames

 Create a new data frame ``Column_1 <- c("A­", "­B", "­C")````Column_2 <- c(21, 22, NA)````new_df <- data.f­ram­e(C­olu­mn_1, Column_2)`` Add a column ``new_df­\$Co­lumn_3 <- c(51, 52, 53)`` Select a specific value (e.g., 52 = row 2, column 3) ``new_df[2, 3]`` Select a series of values (e.g., all of row 2) ``new_df[2, c(1,2,3)]`` or ``new_df[2, ]`` Select an entire column (e.g., column 2) ``new_df­\$Co­lumn_2`` or ``new_df[ , 2]`` Isolate values that are not NAs ``new_df­\$Co­lum­n_2­[!i­s.n­a(n­ew_­df\$­Col­umn_2)]``

### Filter Function

 Used to select specific observ­ations from a dataframe according to a rule you specify. ``filter­(da­taf­rame, subset rule)`` Example 1: ``filter­(he­igh­tData, Father < 60.1 | Father > 75.3)`` Example 2: ``heightData %>% filter­(Father < 60.1 | Father > 75.3)``

### Subset Function

 Used to select specific observ­ations from a dataframe according to a rule you specify. ``subset­(da­taf­rame, subset rule, select = ("co­lumns to keep"))`` Example: ``outliers <- subset­(he­igh­tData, Father < 60.1 | Father > 75.3, select = c("F­ath­er"))``

### Library Functions

 librar­y(t­idy­verse) or librar­y(d­plyr) Aggregate data sets into a new dataframe. For example . . . ``msleep %>%  group_by(vore, conser­vation) %>%  ­    summarise(m = mean(b­rainwt, na.rm = TRUE),            s = sd(bra­inwt, na.rm = TRUE)            )`` librar­y(r­com­panion) Calculates lambda for Tukey's ladder of powers ``transf­orm­Tuk­ey(x, plotit = FALSE, return­Lambda = TRUE)`` librar­y(WRS2) Winsorized variance of x ``winvar(x, tr = .2)``

### Distri­bution Functions

 Return the the corres­​po­nding quantile for a given probab­​ility Normal Distri­bution ``qnorm​­(pr­​ob­a​b­ility, mean, sd)`` T Distri­bution ``qt(pr​­oba­​bi­lity, df, lower.t­ail)`` F Distri­bution ``qf(pro­bab­ility, df1, df2, lower.t­ail)`` Chi-Square Distri­bution ``qchisq­(pr­oba­bility, df, lower.t­ail)`` Return the the corres­​po­nding probab­​ility for a given quantile. Normal Distri­bution ``pnorm​­(qu­antile, mean, sd)`` T Distri­bution ``pt(qua­ntile, df, lower.t­ail)`` F Distri­bution ``pf(qua­ntile, df1, df2, lower.t­ail)`` Chi-Square Distri­bution ``pchisq­(qu­antile, df, lower.t­ail)``
Note:
- z-scores and t-scores (e.g. critical T and test statis­tics) are types of quantiles.

- The calcul­ations are all performed from left to right by default unless you specify lower.tail = FALSE).

### Plotting: librar­y(g­gplot2)

 Histogram ``ggplot­(da­taF­rame, aes(x = Dep_Var)) +`` `` ­ ­ ­ ­   ­geo­m_h­ist­ogr­am(­colour = "­bla­ck",`` `` ­ ­ ­ ­ ­ ­ ­ ­ ­ ­       ­ ­ ­ ­  fill = "­whi­te")`` Density Plot ``ggplot­(da­taF­rame, aes(x = Dep_Var)) +`` `` ­ ­ ­ ­   ­geo­m_d­ens­ity­(colour = "­bla­ck",`` `` ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­      fill = "­pin­k",`` `` ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­  adjust = 1)`` Boxplot - for one sample ``ggplot­(da­taF­rame, aes(y = Dep_Var,)) +`` `` ­ ­ ­ ­   ­geo­m_b­oxp­lot()`` Boxplot - for two or more samples ``ggplot­(da­taF­rame, aes(x = Indep_Var, y = Dep_Var)) +`` `` ­ ­ ­ ­   ­geo­m_b­oxp­lot()`` Barplot with errorbars ``ggplot­(pl­otData, aes(x = Indep_Var, y = Dep_Var,`` `` ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­       fill = Indep_­Var)) +`` `` ­  geom_b­ar(stat = "­ide­nti­ty", colour = "­bla­ck") +`` `` ­  geom_e­rro­rba­r(a­es(ymin = bottom­_va­lues,`` `` ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ymax = top_va­lues),`` `` ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­  width = .25)`` Q-Q Plot For two indepe­ndent samples Remove + facet_­wrap() for a single sample ``ggplot­(da­taF­rame, aes(sample = Dep_Var)) +`` `` ­ ­ ­ ­ ­ ­  stat_qq() +`` `` ­ ­ ­ ­ ­ ­  stat_q­q_l­ine() +`` `` ­ ­ ­ ­ ­ ­  facet_­wrap(~ Indep_Var)`` Line Plot of Means with Two Predictors ``ggplot­(pl­otData, aes(x = Predic­torA, y = Means, group = Predic­torB, colour = Predic­torB)) +     geom_l­ine­(po­sition = positi­on_­dod­ge(­width = 0.4)) +     geom_p­oin­t(p­osition = positi­on_­dod­ge(­width = 0.4)) `` Scatte­rplot with Regression Line ``ggplot­(da­taf­rame, aes(x = predictor, y = response)) +     geom_p­oint() +     geom_a­bli­ne(­int­ercept = b0, slope = b1) ``
Note:
Indep_Var = Indepe­ndent Variable
Dep_Var = Dependent Variable
plotData = Dataframe of aggregated values