Descriptive statistics
Base installation summary(), mean(), sd(), var(), min(), max(), median(), length(), range(), quantile(), fivenum()
|
|
stat.desc(x, basic=TRUE, desc=TRUE, norm=FALSE, p=0.75)
|
basic=TRUE
- no. of values, null values, missing values, min, max, range, sum
|
desc=TRUE
- median, mean, std error of mean, 95% CI for mean, variance, std dev, coefficient of variation
|
norm=TRUE
- skewness, kurtosis, Shapiro-Wilk test of normality
|
|
To call function that has been masked, use Hmisc::describe(x)
Descriptive statistics by group
Single value function - aggregate(mtcars[vars], by=list(am=mtcars$am), mean)
|
Several functions - by(data, INDICES, FUN)
|
dstats <- function(x)(c(mean=mean(x), sd=sd(x)))
|
by(mtcars[vars], mtcars$am, dstats)
|
summaryBy(formula, data=dataframe, FUN=function)
|
Formula - var1 + var2 ... ~ groupvar1 + groupvar2 + ...
|
summaryBy(mpg+hp+wt~am, data=mtcars, FUN=mystats)
|
describe.by(mtcars[vars], mtcars$am)
|
Frequencies and contingency tables
table(var1,var2,...,varN)
|
Creates an N-way contingency table from N categorical variables (factors). Ignores missing values (NAs) by default. useNA="ifany"
to include NA as a valid category. |
|
Creates an N-way contingency table based on a formula and a matrix or data frame |
prop.table(table, margins)
|
Expresses table entries as fractions of the marginal table defined by the margins
|
margin.table(table, margins)
|
Computes the sum of table entries for a marginal table defined by the margins
|
addmargins(table, margins)
|
Puts summary margins
(sums by default) on a table |
|
Creates a compact "flat" contingency table |
Example code
One way table
mytable <- with(Arthritis, table(Improved))
prop.table(mytable) # turn frequencies into proportions
prop.table(mytable)*100 # turn frequencies into percentages
Two way table
mytable <- table(Treatment, Improved)
mytable <- xtabs(~ Treatment + Improved, data = Arthritis)
margin.table(mytable, 1) # generate marginal frequencies, 2 generates column sums
prop.table(mytable, 1) # generate marginal proportions, 2 generates column proportions
prop.table(mytable) # cell proportions
addmargins(mytable) # adds a sum row and column
addmargins(prop.table(mytable))
addmargins(prop.table(mytable, 1), 2) # adds a sum column
addmargins(prop.table(mytable, 2), 1) # adds a sum row
Two way tables can be created using Crosstable() function in gmodels package
Three way table
ftable()
function can print multidimensional tables |
|
|
Chi-square test of independence (Two-way table)
Measures of association (Two-way table)
Covariances / correlations
|
Matrix or data frame |
|
Specifies the handling of missing data. Options are all.obs
(assumes no missing data - missing data will produce an error), everything
(any correlation involving a case with missing values will be set to missing
), complete.obs
(listwise deletion), and pairwise.complete.obs
(pairwise deletion). |
|
Specifies the type of correlation. The options are pearson
, spearman
, or kendall
. |
Options for cov/cor=(x, use=, method= )
Testing correlations for significance
|