Show Menu
Cheatography

Data Analysis with R Cheat Sheet (DRAFT) by

Intro to Data Analytics using R

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Phases of Data Analysis

Ask
Define the problem you are trying to solve.
Prepare
What data do I need to solve this problem? Do I have access to obtain it?
Process
Clean the data of errors and inaccu­racies.
Analyze
Perform calcul­ations to tell a data story. Explor­atory Analysis, Statis­tical modelling
Share
Clear visuals of the data and solution. This includes the reprod­ucible code.
Act
Provide recomm­end­ations based on data.

File Manipu­lation

Get Working Directory
getwd()
Set Working Directory
setwd()
See Directory Contents
dir()
Create Folder
dir.cr­eat­e("t­Fol­der­")
Create File
file.c­rea­te(­"­tes­t.c­sv")
Copy File
file.c­opy­("te­st.c­sv­", "­tFo­lde­r")
Edit File
myedit­(te­st.R)
Delete File
unlink­("te­st.c­sv­")

Structure & Dimensions

Structure
str(data)
Get # of Rows & Columns
dim(data)
Return # of Rows
nrow(data)
Return # of Cols
ncol(data)
Return 1st 6 Rows
head(data)
Get Class Type
class(­data)
 

Importing Data

Web Scraping
con = url("ht­tp:­//g­oog­le.c­om­")
 
htmlCode = readli­nes­(con)
 
close(con)
Remote File
fileUrl <- "­htt­ps:­//w­ebs­ite.co­m/d­ata.cs­v"
 
downlo­ad.f­il­e(f­ileUrl, destfile = "./m­yDa­ta.c­sv­", method = "­cur­l")
Import Data as Table
inData <- read.t­abl­e("d­ata.cs­v", sep = " ", header = TRUE)

Applying Functions

Apply a function over an array
apply(data,­Mar­gin­,Fu­nction) #1=Rows 2=Cols
Apply a function to each element of list, vector, or DF and return a list
lapply(data, Function)
Same as lapply, but returns a vector instead
sapply(data, Function)
Apply a function to a subset specified by the FactorList
tapply(vector, factor­List, Function)

Clean & Test Data

Check for NAs
colSum­s(i­s.n­a(d­ata))
Logical NA Test
all(co­lSu­ms(­is.n­a(­data)) == 0)
Trim Whitespace
trimws­(ch­arV­ector)
Verify Data Type
class(­data) or str(data)
Find Specific
test[t­est­$so­meCol %in%
 
c("a­bcd­efg­", "­hel­lo"),]
 

String Manipu­lation

Uppercase
touppe­r(n­ame­s(c­har­Vec­tor))
Lowercase
tolowe­r(n­ame­s(c­har­Vec­tor))
String Split
strspl­it(­nam­es(­cha­rVe­ctor), "­\\."­)
Find & Replace 1st
sub("_", "­", names(­cha­rVe­ctor))
Find & Replace All
gsub("_­", "­", names(­cha­rVe­ctor))
Get Location of Value
grep("F­", LETTERS)
Get Value from location
grep("F­", LETTERS, value=­TRUE)
Table Count Instances
table(­gre­pl(­"­F", LETTERS))
Get Substring
substr­(ch­arData, 1, 7)
Paste with Space
paste(­"­Tes­t", "­Mes­sag­e")
Paste Without Space
paste0­("Te­st", "­Mes­sag­e")

Statistics

Statis­tical Summary
summar­y(data)
Mean
mean(data)
Standard Deviation
sd(vector)
Variance
var(ve­ctor)
Range
range(­vector)
 
Normal Distri­bution
rnorm(n, mean, sd)
Binomial Distri­bution
rbinom(n, size, prob)
Poisson Distri­bution
rpois(n, size)
Uniform Distri­bution
runif(n, min=0, max=10)
Expone­ntial Distri­bution
rexp(n)
 
K-Means Clustering
kmeans­(data, centers = 3)
Hierar­chical Clustering
hclust­(di­st(­data))