This is a draft cheat sheet. It is a work in progress and is not finished yet.
What is Regression Analysis?A form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable(s) (predictor) | 1. Shows significant relationships btw. target and predictors | 2. Shows strength of impact of multiple predictor on a target |
CoefficientsMean change in the response variable for one unit of change in the predictor variable while holding other predictors in the model constant. |
Polynomial Regression (y=a+b*x^2...)Curve that fits | Higher degree polynomial -> over-fitting risk |
| | MulticollinearityMulticollinearity test: Variance inflation factors (VIF >= 5) | Increase the variance of the coeff. estimates (makes them sensitive) | Stepwise regression does not work as well | Doesn’t affect the overall fit of the model | Doesn't produce bad predictions | SOLUTIONS | - Standardized predictors | - Removing highly correlated predictors | - Linearly combining predictors (x.e. sum) | - Different analyses: PLS or PCA |
Stepwise RegressionMaximize prediction power with minimum number of predictor variables | Fits the regression model by adding/dropping co-variates one at a time | - Standard stepwise regression adds and removes predictors as needed for each step. | - Forward selection starts with most significant predictor and adds variable for each step. | - Backward elimination starts with all predictors and removes the least significant variable for each step. |
Regularized Linear Models (Shrinkage)Regularize linear model through constraining the weights | Regularized term added to cost function. Learning algorithm not only fits data but keeps model weights as small as possible. | Ridge (L2) | Lasso (L1) | ElasticNet (L1 & L2) |
Ridge RegressionL1: adds penalty equivalent to squ. of the magnitude of coefficients | Minimization = LS Obj + α * (sum of squ of coefficients) | It shrinks the value of coefficients but doesn’t reaches zero |
Lasso RegressionL1: adds penalty equivalent to abs. value of the magnitude of coefficients | Minimization = LS Obj + α * (sum of abs value of coefficients) |
LS Obj - Least Squares objective
ElasticNet RegressionRidge and Lasso: 'r' controls de mix ratio. | r*λ*sum(β2) +(1-r/2)* λ*sum(abs(β)) |
| | Regression TypesTechniques are mostly driven by three metrics
Linear Regression (Y=a+b*X + e)Straight line (regression line) | Least Square Method to best fit line | Linear relationship between predictors and target | CONS | Multicollinearity, autocorrelation, heteroskedasticity | Very sensitive to Outliers |
Logistic RegressionTarget binary (0/ 1): binomial distribution | Logit function | widely used for classification problems | can handle various types of relationships because it applies a non-linear log transformation to the predicted odds ratio | maximum likelihood estimates | Requires large sample sizes | Ordinal Target -> Ordinal logistic regression | Multiclass Target -> Multinomial Logistic regression. |
Logistic Regressionodds= p/ (1-p) #event prob / not event prob
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk
|
p is the probability of presence of the characteristic of interest.
|