Cheatography

# R programming Codes Cheat Sheet by [deleted]

 librar­y(r­eadr) librar­y(g­gplot2) librar­y(d­plyr) librar­y(b­room) librar­y(T­misc) librar­y(c­aret) librar­y(c­aret) librar­y(s­plines) librar­y(p­arty) librar­y(l­eaps) librar­y(g­lmnet)

### Apply Functions

 (m=matrix, a=array, l=list; v=vector, d=data­frame) apply(­x,i­nde­x,fun) [input: m; output: a or l; applies function fun to rows/c­ols­/cells (index) of x] lapply­(x,fun) [input l; output l; apply fun to each element of list x] sapply­(x,fun) [input l; output v; user friendly wrapper for lapply(); see also replic­ate()] tapply­(x,­ind­ex,fun) [input l output l; applies fun to subsets of x, as grouped based on index]

### Clustering

 plot(1:nc, wss, type="b­", xlab="N­umber of Cluste­rs", ylab="W­ithin groups sum of square­s")} wssplot <- functi­on(­data, nc=15, seed=1­234){ wss <- (nrow(­dat­a)-­1)*­sum­(ap­ply­(da­ta,­2,var)) for (i in 2:nc){ set.se­ed(­seed) wss[i] <- sum(kmeans­(data, centers=i)\$with­inss)}

### GGplot

 ggplot­(my­data, aes(xvar, yvar)) + geom_p­oin­t(a­es(­col­or=­gro­upvar)) + geom_s­moo­th(­met­hod­="lm­") qplot(x = cty, y = hwy, data = mpg, geom = “point­") [Creates a complete plot with given data, geom, and mappings. Supplies many useful defaults] last_p­lot() [Returns the last plot] ggsave­("pl­ot.p­ng­", width = 5, height = 5) [Saves last plot as 5’ x 5’ file named "­plo­t.p­ng" in working directory. Matches file type to file extension]

### Setup

 create­Dum­myF­eat­ure­s(o­bj=­,ta­rge­t=,­met­hod­=,c­ols=) [creates (0,1) flags for each non-nu­meric variable excluding target **norm­ali­zeF­eat­ure­s(o­bj=­,ta­rge­t=,­met­hod­=,c­ols­=,r­ang­e=,­on.c­on­stant=) center subtract mean scale divide by std. deviation standa­rdize center and scale range linear scale to given range mergeS­mal­lFa­cto­rLe­vel­s(t­ask­=,c­ols­=,m­in.p­erc=) [combine infrequent factor levels into single merged level]

### Basic Codes

 read_c­sv(­"­pat­h/n­han­es.c­sv­") View(df) filter(df, ..,) [Filters data frame according to condition ] mean, median, range [na.rm­=TRUE ] t.test­(y~grp, data=df) wilcox.te­st(­y~grp, data=df) anova(­lmfit) TukeyH­SD(­aov­(lm­fit)) [ANOVA Post-hoc pairwise contrasts] xt <- xtabs(­~x1+x2, data=df) addmar­gin­s(xt) prop.t­abl­e(xt) chisq.t­es­t(xt) fisher.te­st(xt) mosaic­plo­t(xt) factor(x, levels­=c(­"­wt", "­mut­ant­")) relevel(x, ref="wi­ldt­ype­") power.t.t­est(n, power, sd, delta) power.p­ro­p.t­est(n, power, p1, p2) tidy() augment() glance() [Model tidying functions in the broom package]

### Model Functions

 aov(fo­rmula, data) [analysis of variance model] lm(for­mula, data) [fit linear models] glm(fo­rmula, family, data [fit genera­lized linear models] nls(fo­rmula, data) [nonlinear least-­squares estimates of the nonlinear model parame­ters] lmer(f­ormula, data) [fit mixed effects model] (lme4); lme() or (nlme) anova(fit, data...) [provides sequential sums of squares and corres­ponding F-test for objects] contra­sts­(fit, contrasts = TRUE) [view contrasts associated with a factor] contra­sts­(fit, how.many) <‐ value glht(fit, linfct) [makes multiple compar­isons using a linear function linfct (mutcomp)] summar­y(fit) [summary of model, often w/ t-values] confin­t(p­ara­meter) [confi­dence intervals for one or more parameters in a fitted model] predic­t(f­it,...) [predi­ctions from fit]

### Decision Tree

 ctree(­for­mul­a,data) [formula is a formula describing the predictor and response variables]

### Data Inform­ation

 is.na(x) is.nan(x) is.null(x) is.arr­ay(x) is.com­plex(x) is.cha­rac­ter(x) is.dat­a.f­rame(x) is.num­eric(x) head(x) tail(x) summary(x) str(x) length(x) dim(x) dimnam­es(x) attr(x­,which) nrow(x) ncol(x) NROW(x) NCOL(x) class(x) unclass(x)

### Data Splitting and Manipu­lating

 create­Dat­aPa­rti­tio­n(y­,p=0.8) [creat­eDaIt splits a vector 'y' with 80 percent data in one part and 20 percent in other partta­Par­tit­ion­(y,­p=0.8)] trainC­ont­rol­(su­mma­ryF­unction = , classProbs = ) [It is used for contro­lling training parameters like resamp­ling, number of folds, iteration etc.] densit­ypl­ot.r­fe­(x,­dat­a,...) [Lattice functions for plotting resampling results of recursive feature selection] featur­epl­ot(­x,y­,pl­ot...) [A shortcut to produce lattice plots]

### Polynomial regression

 medv=b­0+b­1∗l­sta­t+b­2∗lstat2^ lm(medv ~ lstat + I(lstat^2), data = train.d­ata) lm(medv ~ poly(l­stat, 2, raw = TRUE), data = train.d­ata)

### Spline Model

 spline­(x,y) [cubic spline interp­ola­tion] spline­Kno­ts(­object) knots <- quanti­le(­tra­in.d­at­a\$l­stat, p = c(0.25, 0.5, 0.75))

### Step-wise Selection

 null<- lm(For­mula~1, data=d­train) full<-­lm(­For­mul­a~.,­da­ta=­dtrain) step(null, scope=­lis­t(l­owe­r=null, upper=­full), direct­ion­="fo­rwa­rd") step(full, scope=­lis­t(l­owe­r=full, upper=­null), direct­ion­="ba­ckw­ard­")

### Prepro­cessing

 Transf­orm­ations, filters, and other operations can be applied to the predictors with the preProc option. train(, preProc = c("m­eth­od1­", "­met­hod­2"), ...) train determines the order of operat­ions; the order that the methods are declared does not matter. recipes package has a more extensive list of prepro­cessing operat­ions.