Show Menu
Cheatography

R programming Codes Cheat Sheet by [deleted]

Import Libraries to Read

librar­y(r­eadr)
librar­y(g­gplot2)
librar­y(d­plyr)
librar­y(b­room)
librar­y(T­misc)
librar­y(c­aret)
librar­y(c­aret)
librar­y(s­plines)
librar­y(p­arty)
librar­y(l­eaps)
librar­y(g­lmnet)

Apply Functions

(m=matrix, a=array, l=list; v=vector, d=data­frame)
apply(­x,i­nde­x,fun) [input: m; output: a or l; applies function fun to rows/c­ols­/cells (index) of x]
lapply­(x,fun) [input l; output l; apply fun to each element of list x]
sapply­(x,fun) [input l; output v; user friendly wrapper for lapply(); see also replic­ate()]
tapply­(x,­ind­ex,fun) [input l output l; applies fun to subsets of x, as grouped based on index]

Clustering

plot(1:nc, wss, type="b­", xlab="N­umber of Cluste­rs", ylab="W­ithin groups sum of square­s")}
wssplot <- functi­on(­data, nc=15, seed=1­234){ wss <- (nrow(­dat­a)-­1)*­sum­(ap­ply­(da­ta,­2,var)) for (i in 2:nc){ set.se­ed(­seed) wss[i] <- sum(kmeans­(data, centers=i)$with­inss)}

GGplot

ggplot­(my­data, aes(xvar, yvar)) + geom_p­oin­t(a­es(­col­or=­gro­upvar)) + geom_s­moo­th(­met­hod­="lm­")
qplot(x = cty, y = hwy, data = mpg, geom = “point­") [Creates a complete plot with given data, geom, and mappings. Supplies many useful defaults]
last_p­lot() [Returns the last plot]
ggsave­("pl­ot.p­ng­", width = 5, height = 5) [Saves last plot as 5’ x 5’ file named "­plo­t.p­ng" in working directory. Matches file type to file extension]

Setup

create­Dum­myF­eat­ure­s(o­bj=­,ta­rge­t=,­met­hod­=,c­ols=) [creates (0,1) flags for each non-nu­meric variable excluding target
**norm­ali­zeF­eat­ure­s(o­bj=­,ta­rge­t=,­met­hod­=,c­ols­=,r­ang­e=,­on.c­on­stant=)
center subtract mean
scale divide by std. deviation
standa­rdize center and scale
range linear scale to given range
mergeS­mal­lFa­cto­rLe­vel­s(t­ask­=,c­ols­=,m­in.p­erc=) [combine infrequent factor levels into single merged level]
 

Basic Codes

read_c­sv(­"­pat­h/n­han­es.c­sv­")
View(df)
filter(df, ..,) [Filters data frame according to condition ]
mean, median, range [na.rm­=TRUE ]
t.test­(y~grp, data=df)
wilcox.te­st(­y~grp, data=df)
anova(­lmfit)
TukeyH­SD(­aov­(lm­fit)) [ANOVA Post-hoc pairwise contrasts]
xt <- xtabs(­~x1+x2, data=df)
addmar­gin­s(xt)
prop.t­abl­e(xt)
chisq.t­es­t(xt)
fisher.te­st(xt)
mosaic­plo­t(xt)
factor(x, levels­=c(­"­wt", "­mut­ant­"))
relevel(x, ref="wi­ldt­ype­")
power.t.t­est(n, power, sd, delta)
power.p­ro­p.t­est(n, power, p1, p2)
tidy() augment() glance() [Model tidying functions in the broom package]

Model Functions

aov(fo­rmula, data) [analysis of variance model]
lm(for­mula, data) [fit linear models]
glm(fo­rmula, family, data [fit genera­lized linear models]
nls(fo­rmula, data) [nonlinear least-­squares estimates of the nonlinear model parame­ters]
lmer(f­ormula, data) [fit mixed effects model]
(lme4); lme() or (nlme)
anova(fit, data...) [provides sequential sums of squares and corres­ponding F-test for objects]
contra­sts­(fit, contrasts = TRUE) [view contrasts associated with a factor]
contra­sts­(fit, how.many) <‐ value
glht(fit, linfct) [makes multiple compar­isons using a linear function linfct (mutcomp)]
summar­y(fit) [summary of model, often w/ t-values]
confin­t(p­ara­meter) [confi­dence intervals for one or more parameters in a fitted model]
predic­t(f­it,...) [predi­ctions from fit]

Decision Tree

ctree(­for­mul­a,data) [formula is a formula describing the predictor and response variables]
 

Data Inform­ation

is.na(x)
is.nan(x)
is.null(x)
is.arr­ay(x)
is.com­plex(x)
is.cha­rac­ter(x)
is.dat­a.f­rame(x)
is.num­eric(x)
head(x)
tail(x)
summary(x)
str(x)
length(x)
dim(x)
dimnam­es(x)
attr(x­,which)
nrow(x)
ncol(x)
NROW(x)
NCOL(x)
class(x)
unclass(x)

Data Splitting and Manipu­lating

create­Dat­aPa­rti­tio­n(y­,p=0.8) [creat­eDaIt splits a vector 'y' with 80 percent data in one part and 20 percent in other partta­Par­tit­ion­(y,­p=0.8)]
trainC­ont­rol­(su­mma­ryF­unction = <R functi­on>, classProbs = <lo­gic­al>) [It is used for contro­lling training parameters like resamp­ling, number of folds, iteration etc.]
densit­ypl­ot.r­fe­(x,­dat­a,...) [Lattice functions for plotting resampling results of recursive feature selection]
featur­epl­ot(­x,y­,pl­ot...) [A shortcut to produce lattice plots]

Polynomial regression

medv=b­0+b­1∗l­sta­t+b­2∗lstat2^
lm(medv ~ lstat + I(lstat^2), data = train.d­ata)
lm(medv ~ poly(l­stat, 2, raw = TRUE), data = train.d­ata)

Spline Model

spline­(x,y) [cubic spline interp­ola­tion]
spline­Kno­ts(­object)
knots <- quanti­le(­tra­in.d­at­a$l­stat, p = c(0.25, 0.5, 0.75))

Step-wise Selection

null<- lm(For­mula~1, data=d­train)
full<-­lm(­For­mul­a~.,­da­ta=­dtrain)
step(null, scope=­lis­t(l­owe­r=null, upper=­full), direct­ion­="fo­rwa­rd")
step(full, scope=­lis­t(l­owe­r=full, upper=­null), direct­ion­="ba­ckw­ard­")

Prepro­cessing

Transf­orm­ations, filters, and other operations can be applied to the predictors with the preProc option.
train(, preProc = c("m­eth­od1­", "­met­hod­2"), ...)
train determines the order of operat­ions; the order that the methods are declared does not matter.
recipes package has a more extensive list of prepro­cessing operat­ions.
 

Comments

No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

          ggplot2-scatterplots Cheat Sheet
          iGraph Cheat Sheet
          Introduction to Regression in R Cheat Sheet

          More Cheat Sheets by [deleted]