This is a draft cheat sheet. It is a work in progress and is not finished yet.
What is Regression Analysis?
A form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable(s) (predictor)
1. Shows significant relationships btw. target and predictors
2. Shows strength of impact of multiple predictor on a target
Mean change in the response variable for one unit of change in the predictor variable while holding other predictors in the model constant.
Polynomial Regression (y=a+b*x^2...)
Curve that fits
Higher degree polynomial -> over-fitting risk
Multicollinearity test: Variance inflation factors (VIF >= 5)
Increase the variance of the coeff. estimates (makes them sensitive)
Stepwise regression does not work as well
Doesn’t affect the overall fit of the model
Doesn't produce bad predictions
- Standardized predictors
- Removing highly correlated predictors
- Linearly combining predictors (x.e. sum)
- Different analyses: PLS or PCA
Maximize prediction power with minimum number of predictor variables
Fits the regression model by adding/dropping co-variates one at a time
- Standard stepwise regression adds and removes predictors as needed for each step.
- Forward selection starts with most significant predictor and adds variable for each step.
- Backward elimination starts with all predictors and removes the least significant variable for each step.
Regularized Linear Models (Shrinkage)
Regularize linear model through constraining the weights
Regularized term added to cost function. Learning algorithm not only fits data but keeps model weights as small as possible.
Ridge (L2) | Lasso (L1) | ElasticNet (L1 & L2)
L1: adds penalty equivalent to squ. of the magnitude of coefficients
Minimization = LS Obj + α * (sum of squ of coefficients)
It shrinks the value of coefficients but doesn’t reaches zero
L1: adds penalty equivalent to abs. value of the magnitude of coefficients
Minimization = LS Obj + α * (sum of abs value of coefficients)
LS Obj - Least Squares objective
Ridge and Lasso: 'r' controls de mix ratio.
r*λ*sum(β2) +(1-r/2)* λ*sum(abs(β))
Techniques are mostly driven by three metrics
Linear Regression (Y=a+b*X + e)
Straight line (regression line)
Least Square Method to best fit line
Linear relationship between predictors and target
Multicollinearity, autocorrelation, heteroskedasticity
Very sensitive to Outliers
Target binary (0/ 1): binomial distribution
widely used for classification problems
can handle various types of relationships because it applies a non-linear log transformation to the predicted odds ratio
maximum likelihood estimates
Requires large sample sizes
Ordinal Target -> Ordinal logistic regression
Multiclass Target -> Multinomial Logistic regression.
odds= p/ (1-p) #event prob / not event prob
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk
p is the probability of presence of the characteristic of interest.