Show Menu

Regression Cheat Sheet (DRAFT) by

This is a draft cheat sheet. It is a work in progress and is not finished yet.

What is Regression Analysis?

A form of predictive modelling technique which invest­igates the relati­onship between a dependent (target) and indepe­ndent variab­le(s) (predictor)
1. Shows signif­icant relati­onships btw. target and predictors
2. Shows strength of impact of multiple predictor on a target


Mean change in the response variable for one unit of change in the predictor variable while holding other predictors in the model constant.
Think of them as slopes

Polynomial Regression (y=a+b­*x^­2...)

Curve that fits
Higher degree polynomial -> over-f­itting risk

Polynomial Regression



Multic­oll­ine­arity test: Variance inflation factors (VIF >= 5)
Increase the variance of the coeff. estimates (makes them sensitive)
Stepwise regression does not work as well
Doesn’t affect the overall fit of the model
Doesn't produce bad predic­tions
- Standa­rdized predictors
- Removing highly correlated predictors
- Linearly combining predictors (x.e. sum)
- Different analyses: PLS or PCA

Stepwise Regression

Maximize prediction power with minimum number of predictor variables
Fits the regression model by adding­/dr­opping co-var­iates one at a time
- Standard stepwise regression adds and removes predictors as needed for each step.
- Forward selection starts with most signif­icant predictor and adds variable for each step.
- Backward elimin­ation starts with all predictors and removes the least signif­icant variable for each step.

Regula­rized Linear Models (Shrin­kage)

Regularize linear model through constr­aining the weights
Regula­rized term added to cost function. Learning algorithm not only fits data but keeps model weights as small as possible.
Ridge (L2) | Lasso (L1) | ElasticNet (L1 & L2)

Ridge Regression

L1: adds penalty equivalent to squ. of the magnitude of coeffi­cients
Minimi­zation = LS Obj + α * (sum of squ of coeffi­cients)
It shrinks the value of coeffi­cients but doesn’t reaches zero

Lasso Regression

L1: adds penalty equivalent to abs. value of the magnitude of coeffi­cients
Minimi­zation = LS Obj + α * (sum of abs value of coeffi­cients)
LS Obj - Least Squares objective

ElasticNet Regression

Ridge and Lasso: 'r' controls de mix ratio.
r*λ*sum(β2) +(1-r/2)* λ*sum(­abs(β))

Regression Types

Techniques are mostly driven by three metrics

Linear Regression (Y=a+b*X + e)

Straight line (regre­ssion line)
Least Square Method to best fit line
Linear relati­onship between predictors and target
Multic­oll­ine­arity, autoco­rre­lation, hetero­ske­das­ticity
Very sensitive to Outliers

Logistic Regression

Target binary (0/ 1): binomial distri­bution
Logit function
widely used for classi­fic­ation problems
can handle various types of relati­onships because it applies a non-linear log transf­orm­ation to the predicted odds ratio
maximum likelihood estimates
Requires large sample sizes
Ordinal Target -> Ordinal logistic regression
Multiclass Target -> Multin­omial Logistic regres­sion.

Logistic Regression

Logistic Regression

odds= p/ (1-p) #event prob / not event prob
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p))  = b0+b1X1+b2X2+b3X3....+bkXk
p is the probab­ility of presence of the charac­ter­istic of interest.