Basic Math
|
Exponential |
|
Sum |
|
Natural log |
|
Cumulative Sum |
|
Largest element |
|
Round up |
|
Smallest element |
|
Round down |
|
Modulo |
Control Flow
for (variable in sequence) {...}
|
for-loop. If the loop body contains only a single line, the curly brackets can be omitted. |
|
while-loop |
if (i > 5) {
... else {
...
}
|
if-else-block |
foo = function(arg1, arg2, ...) {
...
return(var)
}
|
function |
Vectors
Creating Vectors |
|
Join elements into a vector |
|
An integer sequence (end inclusive!) |
|
Complex sequence (s. np.linspace) |
|
Repeat vector |
|
Repeat each element |
Functions |
|
Return x sorted. |
|
Return x reversed. |
|
See unique values. |
|
Length of x. |
Selecting Vector Elements |
By Position |
|
The fourth element |
|
All but the fourth. |
|
Elements two to four |
|
All elements except 2 to four |
|
Elements one and five. |
By Value |
|
All elements equal to 10 |
|
All elements less than 10. |
|
Elements in the given set. |
Named Vectors |
|
Element with name 'apple'. |
Tables
|
get absolute frequencies of values |
as.numeric(tab); as.vector(tab)
|
Extract values and their absolute frequencies from table |
|
Compute relative frequencies |
Matrices
m = matrix(x, nrow = 3, ncol = 3)
|
Create a matrix from vector x
|
|
transpose |
|
Matrix multiplication |
|
|
|
Determinant |
|
Find eigen vectors and values |
Data sets
Data=data.frame(price=c(11,20,14,15), number=c(40,50,60,20)) |
Create a data set |
Interacting with data sets |
col_1 = data$col_1_name
|
Access column data |
Data[1,2]; Data[,2]; Data[[1]]
|
Access data with index notation |
I/O |
data = read.csv("file.csv", header=FALSE, sep="")
|
Read csv (function arguments similar to that used in pandas) |
write.csv(data, "data.csv", row.names=FALSE, sep=" ")
|
Write data set as csv |
Filter |
|
Filter data frame |
subset(df, kids== "John" & Grade == 1.3)
|
Filter multiple columns |
subset(df, kids %in% c("Jack", "John"))
|
Filter a column with multiple values |
unique(housing[, c("State", "region")])
|
Extract unique rows |
Sort |
housing[order(housing$Home.Value), ]
|
Order data frame in ascending order |
housing[order(housing$Home.Value, decreasing = TRUE), ]
|
order data frame in descending order |
Meta |
|
Check the dimensions of a data frame |
|
Return the column names |
Manipulate data |
Data_Frame_New <- Data_Frame[-c(1), -c(1)]
|
Remove columns and/or rows from data frame |
|
Combine data frames vertically |
I/O
write(data, "mydata.dat")
|
Write data as binary. |
|
Read binary data. |
|
Current working directory |
Random Numbers
sample(1:3,prob=c(1/6,1/3,1/2),replace=TRUE,20)
|
Draw 20 balls, labeled from 1 to 3, from box with replacement. |
|
Draw n
numbers from distribution <distr. ID>
with parameters params
|
(see Distributions in R for more details) |
Characteristics of data sequences
|
Arithmetic mean of the data sequence |
|
Variance |
|
Median |
|
Quantile. type=7
is the default computation algorithm, i.e. the function returns the value at position k=1+p(n-1)
, if this is an integer. Otherwise, R computes a weighted mean of the two neighboring integers |
|
General inverse function of the ECDF (smallest p-quantile). Largetst p-quantile can be obtained indirectly by slightly increasing p |
|
Overview of important measures |
|
Covariance |
|
Correlation |
Distributions in R
General usage |
|
density function |
|
quantile function. Always computes the smallest quantile |
|
cumulative distribution function |
|
random variate generation |
Distributions |
dbinom(x, size=p, prob=p)
|
Binomial |
|
Chi-squared |
|
Exponential |
dgamma(x, shape=r, rate=l)
|
Gamma |
|
Geometric |
|
Negative binomial |
|
Normal |
|
Poisson |
|
t-distribution |
|
Uniform |
Plotting
Basic plots |
|
Plot quick overview. |
plot(x, y, xlab="mu", ylab="Power", type="l", col="red", ylim=c(0,1), lwd=1.5)
|
Plot data with custom style options |
Lines and curves |
abline(a,b,col="red")
|
Add a red line with intercept a
and slope b
to the plot. |
abline(v=a,col="red")
|
|
abline(h=b,col="red")
|
add horizontal line at y=b
|
lines(x, y, col="green", lwd=1.5)
|
Add a generic line |
curve(sin,-pi,pi,add=TRUE)
|
Draw a curve of a function over the specified interval |
Data visualization |
|
Plot ECDF. |
barplot(x, main="Title", xlab="x label")
|
Annotated barplot of absolute frequencies |
hist(data, prob=TRUE, breaks=30)
|
Histogram of relative frequencies (30 bins). |
|
1D-plot |
boxplot(data1, data2, ... ,range=1.5)
|
Plot boxplots of one or more data sequences in one window. range
determines the extend of the whiskers. Default range=1.5
, i.e. 1.5 x IQR |
|
QQ-Plot against standard normal distribution |
qqPlot(x, dist="unif",...)
|
QQ-Plot against any R-standard distribution. Additional arguments such as df
, ncp
can also be specified. |
legend(x,y, legend=c("n=10"),col=c("red"), lty=1, cex=0.8)
|
Add legend to plot as position (x,y)
|
Statistical hypothesis testing
One-Sample tests |
t.test(x,mu=mu0,alt="less", conf.level=1-alpha)
|
Performs one and two sample t-tests on vectors of data. |
power.t.test(n = 100, delta=0.1, sd=2, sig.level=0.1, type="one.sample", alt="one.sided")
|
Compute the power of the one- or two- sample t test, or determine parameters to obtain a target power. |
binom.test(sum(x),n,p0,alt="greater", conf.level=1-alpha)
|
Performs an exact test of a simple null hypothesis about the probability of success in a Bernoulli experiment. It might happen that the decision based on the p-value differs from that of the confidence interval. Choose the decision based on the p-value in such cases. |
Two-Sample tests |
t.test(shoes$A,shoes$B,paired=FALSE, var.equal=TRUE)
|
Example for an unpaired sample t-test |
var.test(x,y,conf.level=1-alpha)
|
Performs an F test to compare the variances of two samples from normal populations. |
GOF tests |
|
Performs the Shapiro-Wilk test of normality. |
chisq.test(table(x), p=p_0)
|
Test for distribution with probabilities p_0
. If p
is not specified, R tests for a uniform distribution |
chisq.test(table(x), p=p_0, simulate.p.value=TRUE)
|
Do not use Chi2-approximation to calculate the p-value |
pwr.chisq.test(w=ncp, df=s-1, sig.level=alpha, power=0.9)
|
Determine the number of samples needed to reach the desired power at the given significance level |
ks.test(x, "pnorm", 0, 1)
|
One-sample Kolmogorov-Smirnov test against hypothetical distribution |
|
Lilliefors (Kolmogorov-Smirnov) test for the composite hypothesis of normality |
Tests of independence |
|
Chi2-test of independence. M hast to be a matrix! (contingency table) |
|
Exact test of Fisher. If the table entries are too large, use simulate.p.value=TRUE
|
|
Runs test of independence. x
hast to be a factor (use as.factor()
if necessary) |
Runs Test of Randomness
|
Compute the lengths and values of runs of equal values in a vector . |
|
Vector containing the length of each run. |
|
Vector of the same length as lengths with the corresponding values. |
Optimization
|
Carries out a minimization of the function f using a Newton-type algorithm. May not give all solutions. The function must be vectorized |
E2vec=Vectorize(E2, vectorize.args=c("n"))
|
vectorize a function. vectorize.args
: explicitly state arguments to be vectorized. |
Distribution Fit |
fitdistr(x, "Poisson")
|
Maximum-likelihood fitting of univariate distributions, allowing parameters to be held fixed if desired. ( library(MASS)
) |
Regression |
|
Fit a linear function x=a+bt
|
|
Obtain further information about regression result Important fields:- Residual standard error: sd of residuals (with normalization n-2
) -t value: Test null hypothesis "estimate is 0" with assumption of a normally distributed random mechanism -multiple R-squared: squared corr. coef. Null hypothesis r 2=0 is tested with F-statistic |
|
Fit a polynomial function. I()
inhibits R from interpreting t^2
as a formula |
form=x ~ a/(1+exp(-b*(t-c))) reg=nls(form, data=USPop, start=c(a=400,b=0.02,c=2000))
|
perform non-linear least-squares regression |
|
Plot regression result |
Root finding |
res = uniroot(func, c(0,10))
|
Searches interval for a root of the function func
. res$root
and res$f.root
give the location of the root and the value of the function |
Help
|
Display documentation of the command sqrt
` |
|
use quotation marks for special characters |
Miscellaneous
Printing |
|
Default print |
sprintf("Formatted %s: %.3f", object, mean)
|
Formatted print |
|
enclose an R command with brackets to directly print the result |
|
Invoke text editor on R object |
Libraries |
|
Load package MASS |
|
find 1D root |
Step functions |
|
Given the vectors (x1, ...,xn) and (y0,...,yn) (one value more!), returns an interpolating ‘step’ function |
|
returns jump positions of stepfunction |
|