This is a draft cheat sheet. It is a work in progress and is not finished yet.
What is Regression Analysis?
A form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable(s) (predictor) |
1. Shows significant relationships btw. target and predictors |
2. Shows strength of impact of multiple predictor on a target |
Coefficients
Mean change in the response variable for one unit of change in the predictor variable while holding other predictors in the model constant. |
Polynomial Regression (y=a+b*x^2...)
Curve that fits |
Higher degree polynomial -> over-fitting risk |
|
|
Multicollinearity
Multicollinearity test: Variance inflation factors (VIF >= 5) |
Increase the variance of the coeff. estimates (makes them sensitive) |
Stepwise regression does not work as well |
Doesn’t affect the overall fit of the model |
Doesn't produce bad predictions |
SOLUTIONS |
- Standardized predictors |
- Removing highly correlated predictors |
- Linearly combining predictors (x.e. sum) |
- Different analyses: PLS or PCA |
Stepwise Regression
Maximize prediction power with minimum number of predictor variables |
Fits the regression model by adding/dropping co-variates one at a time |
- Standard stepwise regression adds and removes predictors as needed for each step. |
- Forward selection starts with most significant predictor and adds variable for each step. |
- Backward elimination starts with all predictors and removes the least significant variable for each step. |
Regularized Linear Models (Shrinkage)
Regularize linear model through constraining the weights |
Regularized term added to cost function. Learning algorithm not only fits data but keeps model weights as small as possible. |
Ridge (L2) | Lasso (L1) | ElasticNet (L1 & L2) |
Ridge Regression
L1: adds penalty equivalent to squ. of the magnitude of coefficients |
Minimization = LS Obj + α * (sum of squ of coefficients) |
It shrinks the value of coefficients but doesn’t reaches zero |
Lasso Regression
L1: adds penalty equivalent to abs. value of the magnitude of coefficients |
Minimization = LS Obj + α * (sum of abs value of coefficients) |
LS Obj - Least Squares objective
ElasticNet Regression
Ridge and Lasso: 'r' controls de mix ratio. |
r*λ*sum(β2) +(1-r/2)* λ*sum(abs(β)) |
|
|
Regression Types
Techniques are mostly driven by three metrics
Linear Regression (Y=a+b*X + e)
Straight line (regression line) |
Least Square Method to best fit line |
Linear relationship between predictors and target |
CONS |
Multicollinearity, autocorrelation, heteroskedasticity |
Very sensitive to Outliers |
Logistic Regression
Target binary (0/ 1): binomial distribution |
Logit function |
widely used for classification problems |
can handle various types of relationships because it applies a non-linear log transformation to the predicted odds ratio |
maximum likelihood estimates |
Requires large sample sizes |
Ordinal Target -> Ordinal logistic regression |
Multiclass Target -> Multinomial Logistic regression. |
Logistic Regression
odds= p/ (1-p) #event prob / not event prob
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk
|
p is the probability of presence of the characteristic of interest.
|