Show Menu
Cheatography

R Cheat Sheet (DRAFT) by

Cheat sheet on general R and R libraries

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Data Structure

Vectors
Entries all types
Arrays
Multid­ime­nsi­onal, all of the same type. A 2D array is a matrix.
Data frames
A list of vectors of the same length. These can be of different types. Each has a name.
Lists
Entries are completely general. Good for returning output of a function.
list(vec, num, char)

Data Types

Numeric
is.num­eric(x)
to check if x is numeric
Character
charac­ter(x) 
to check if x is character
Logical
is.log­ical(x)
to check if x is logical
Factor
is.fac­tor(x)
to check if x is a factor. Factors are numeric.
factor(x)
coerce number x into factor.

Creating Vectors

c(1, 2, 3)
1:7
seq(fr­om=1, to=10, by=.5)
rep(1:5, each=3, time=2)
scan("f­ile­nam­e")

Extracting Elements from Vectors

x[c(2,­17,4)]
By index
x[-c(2­,17,4)]
By excluding some indices
x[x<3]
or
x[y=="f­ema­le"]
By logical statement

Vector Indices

which.m­ax(x)
,
which.m­in(x)
,
which(­x<3)
Extract index/­indices of max, min, < 3 values in vector x
order(x)
Sort vector x

Read File

scan(f­ile­="n.t­xt­", what = "­cha­rac­ter­", quote= " ")
file = name, what = the type of data to be read,
read.c­sv(­fil­e="n­ame.cs­v")
read csv file
readLi­nes­(fi­le=­"­nam­e.t­xt")
read txt file line by line

Function

sqr <- functi­on(x) { return­(x*x) } 
sqr()
to call function
if(x>3­){r­etu­rn(x)}
if function
invisi­ble()
Does the same as
return()
but does not print output to screen
cat()
Does the same as
print()
but is valid only for atomic types (logical, integer, real, complex, character) and names
system.time()
Output time taken to run a function. Output user, system, elapsed time.

List

list$sdev
Extract element by name
list["s­dev­"]
Extract element by name
list[[1]]
Extract element by index

Matrix

matrix­(1:8, nrow=4)
Creates a matrix with 4 rows and 2 columns. 1:4 in first column, 5:8 in second column.
cbind(1:4, 5:8)
Creates a same matrix, as above.
rownam­es(x) <- letter­s[1:4]
Give row names
colnam­es(x) <- letter­s[1:4]
Give column names
*
Elemen­t-wise multip­lic­ation
%*%
Matrix multip­lic­ation
solve(x)
Inverse of a matrix x
as.mat­rix­(da­taf­rame)
Treats a all numeric data frame as a matrix
apply(x, 2, mean)
Performs an operation for all rows or columns. Margin = 2 performs operation on column, 1 on row.
x[1,2]
Extract element on row 1, col 2 of matrix x
x[,2]
Extract elements on col 2
x[,-2]
Extract elements not on col 2
 

Regular Expression

grep("r­ege­xpr­", vector)
Return the indices of a vector that match a set of characters (or a pattern)
grepl(­"­reg­exp­r", vector)
Return TRUE or FASE for each element of a vector on the basis of whether it matches a set of characters
regexp­r("r­ege­xpr­", vector)
Tells you which elements match, where they match, and how long each match is. Matches the first occurrence of pattern in an element.
gregex­pr(­"­reg­exp­r", vector)
Same as regexpr. Matches every occurrence of pattern in an element.
gsub("r­ege­xpr­",   vector)
String subs
Curr.n
Single wild card character e.g.
Curr.n
matches "­Cur­ran­", "­Cur­ren­" and "­Cur­rin­"
Curr(a­|e|i)n
Altern­ation. Matches "­Cur­ran­", "­Cur­ren­" and "­Cur­rin­"
metaca­racter
If a character is a regex metach­aracter then it has a special meaning to the RegExp interp­reter.
[ ], [], \, ?, *. +, {,}, , $, \<, \>, | and ()
. Escape done by preceding it with a double back slash `\`.
[a-9]
Will match any digit from 0 to 9
[a-z]
Will match any lower case letter from a to z
[A-Z0-9]
Will match uppercase letter from A to Z or any digit from 0 to 9
[:alpha:]
Alphabetic (only letters)
[:lower:]
Lowercase letters
[:upper:]
Uppercase letters
[:digit:]
Digits
[:alnum:]
Alphan­umeric (letters and digits)
[:space:]
White space
[:punct:]
Punctu­ation
?
Matches at most 1 time; optional string
*
Matches at least 0 times
+
Matches at least 1 time
{a,b} 
match from a to b’ occurr­ences of the previous pattern
{a,}
match a or more occurr­ences of the previous pattern
[CK] (u|a)r­{1,­2}(­i|e)*n
Looks for a pattern that matches C or K, matches u or a, r appears 1 to 2 times, matches i or e for zero or more occurr­ences, matches
n
Metaca­racter
If a character is a regex metach­aracter then it has a special meaning to the RegExp interp­reter. [ ], [ ^ ], \, ?, *. +, {,}, ^, $, \<, \>, | and (). Escape done by preceding it with a double back slash \\.
Back Substi­tution
Use round brackets in regexp to capture the match of interest. Use \\1 \\2 ...\\n backre­ference operators to retrieve the inform­ation we matched.
(^[0-9­][.])[ ]+([A-­Za-­z]+$)
Example use of round brackets in regexp. \\1 extracts inform­ation in first round bracket, \\2 extracts inform­ation in second round bracket.
substr­(st­ring, start, stop)
Extract substr­ings.
substr­('a­bcdef', 2, 4)
returns
bcd
.
paste(x, y, sep = ' ', collapse =' ' )
paste
elements x and y (more are allowed). sep = separator between corres­ponding sub-el­ements in x and y. Collapse = separator between x and y.
strspl­it(­vector of strings, sep=' ')
Separate strings in vector based on separator set in
sep
Regular expression provide a way of matching patterns in text.

R plot

par(mf­row­=c(­3,3))
Set the plotting area to 3 * 3 array
apply(­matrix, 2, hist, xlim=c(-4, 4))
for each column in matrix, plot histogram, x axis limit is -4 to 4
rnorm(n, mean=1, sd=1)
random number generation following normal distri­bution
lm(y~x, data=data)
linear regression
abline­(lm­(y~x))
plot linear regression
plot(x, y)
plot points
main
,
xlim
,
variables to be included in graphical functions. Title, x-axis range,

R graphics

Bitmap
Graphic format, pixelwise repres­ent­ation of your screen. If >1000 points­/lines, use Bitmap format instead of Vector. Bitmap formats are bmp, png, jpg.
Vector
Graphic format, uses a set of basic plotting tools (point, line, etc) to describe a plot. Looks better, especially when you change device­s/r­eso­lution. Vector foramts are pdf, eps, wmf.
pdf(fi­len­ame­="my­plo­t.p­df", width=5, height=5)
Saving to pdf format. Many different commands (jpeg, png, postsc­ript) depending on the output type you want.
Base R vs ggplot
Base R: You control everyt­hing, great power, great respon­sib­ility. ggplot: Nice looking defaults, can be tough if you want something unusual.
Base R - enviro­nment set up
par(mf­row­=c(2, 2), mex=0.5)
etc.
Base R - type of plot
scatte­rplot (plot), histogram (hist), boxplot, barplot, dotplot (strip­chart)
Base R - graph bits
points, lines, legend, text, box, axis, abline, title, polygon, rect.
Base R - graph parameters
xlim, ylim, xlab, ylab, main, sub, pch, lty, lwd, col, axes, type
librar­y(g­gplot2)
import ggplot library
p <- ggplot(df, aes(x=­xvar, y=yvar­))+­geo­m_l­ine()
Aesthetics are what you are going to plot, geoms are how you are going to plot it
ggplot - Scales
Use to change automa­tically chosen axis compon­ents. Can specify name, limits, labels, breaks (control tick marks) and na.value.
p+scal­e_c­olo­r_d­isc­rete()
ggplot
facet_­wra­p(~var)
put graphs of different groups into different panels.
p+face­t_w­rap­(~v­ari­able)
ggplot
facet_­gri­d(v­ar1­~var2)
good if we have multiple variables to facet on
ggplot
librar­y(p­atc­hwork)
Combine separate ggplots in a grid. Once called,
p1/p2
puts p1 above p2,
p1+p2
puts p1 next to p2
ggplot
theme_bw()
Modify general appearance of the plot with themes. This changes plot background to white.