Show Menu
Cheatography

Basic statistics with R Cheat Sheet (DRAFT) by

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Descri­ptive statistics

Base instal­lation
summary(), mean(), sd(), var(), min(), max(), median(), length(), range(), quanti­le(), fivenum()
Hmisc
package
describe()
pastecs
package
stat.d­esc(x, basic=­TRUE, desc=TRUE, norm=F­ALSE, p=0.75)
 
basic=TRUE
- no. of values, null values, missing values, min, max, range, sum
 
desc=TRUE
- median, mean, std error of mean, 95% CI for mean, variance, std dev, coeffi­cient of variation
 
norm=TRUE
- skewness, kurtosis, Shapir­o-Wilk test of normality
psych
package
describe()
To call function that has been masked, use
Hmisc:­:de­scr­ibe(x)

Descri­ptive statistics by group

aggreg­ate()
Single value function -
aggreg­ate­(mt­car­s[v­ars], by=lis­t(a­m=m­tca­rs$am), mean)
 
Several functions -
by(data, INDICES, FUN)
 
dstats <- functi­on(­x)(­c(m­ean­=me­an(x), sd=sd(x)))
 
by(mtc­ars­[vars], mtcars$am, dstats)
doBy
package
summar­yBy­(fo­rmula, data=d­ata­frame, FUN=fu­nction)
 
Formula -
var1 + var2 ... ~ groupvar1 + groupvar2 + ...
 
summar­yBy­(mp­g+h­p+w­t~am, data=m­tcars, FUN=my­stats)
psych
package
descri­be.b­y(­mtc­ars­[vars], mtcars$am)

Freque­ncies and contin­gency tables

table(­var­1,v­ar2­,...,varN)
Creates an N-way contin­gency table from N catego­rical variables (factors). Ignores missing values (NAs) by default.
useNA=­"­ifa­ny"
to include NA as a valid category.
xtabs(­for­mula, data)
Creates an N-way contin­gency table based on a formula and a matrix or data frame
prop.t­abl­e(t­able, margins)
Expresses table entries as fractions of the marginal table defined by the
margins
margin.ta­ble­(table, margins)
Computes the sum of table entries for a marginal table defined by the
margins
addmar­gin­s(t­able, margins)
Puts summary
margins
(sums by default) on a table
ftable­(table)
Creates a compact "­fla­t" contin­gency table

Example code

One way table
mytable <- with(A­rth­ritis, table(­Imp­roved))

prop.t­abl­e(m­ytable) # turn freque­ncies into propor­tions

prop.t­abl­e(m­yta­ble­)*100 # turn freque­ncies into percen­tages

Two way table
mytable <- table(­Tre­atment, Improved)

mytable <- xtabs(~ Treatment + Improved, data = Arthritis)


margin.ta­ble­(my­table, 1) # generate marginal freque­ncies, 2 generates column sums


prop.t­abl­e(m­ytable, 1) # generate marginal propor­tions, 2 generates column propor­tions


prop.t­abl­e(m­ytable) # cell propor­tions


addmar­gin­s(m­ytable) # adds a sum row and column

addmar­gin­s(p­rop.ta­ble­(my­table))


addmar­gin­s(p­rop.ta­ble­(my­table, 1), 2) # adds a sum column

addmar­gin­s(p­rop.ta­ble­(my­table, 2), 1) # adds a sum row


Two way tables can be created using
Crosst­able()
function in
gmodels
package


Three way table
ftable()
function can print multid­ime­nsional tables
 

Chi-square test of indepe­ndence (Two-way table)

Measures of associ­ation (Two-way table)

Covari­ances / correl­ations

x
Matrix or data frame
use
Specifies the handling of missing data. Options are
all.obs
(assumes no missing data - missing data will produce an error),
everything
(any correl­ation involving a case with missing values will be set to
missing
),
comple­te.obs
(listwise deletion), and
pairwi­se.c­om­ple­te.obs
(pairwise deletion).
method
Specifies the type of correl­ation. The options are
pearson
,
spearman
, or
kendall
.
Options for
cov/co­r=(x, use=, method= )

Partial correl­ations

Testing correl­ations for signif­icance

Indepe­ndent t-test

Dependent t-test