Cheatography
https://cheatography.com
Machine Learning Model and Interpretation
This is a draft cheat sheet. It is a work in progress and is not finished yet.
Supervised Vs Unsupervised Learnig
Supervised 
Unsupervised 
Used in Classification and Prediction 
Dimension Reduction and clustering 
Value of outcome must be known 
No outcome variable to predict or classify 
Learns from training data and applied to validation 
No learning 
How Supervised Learning Looks
How Unsupervised Learning Looks
Supervised vs Unsupervised TLDR
1. Linear Regression
Type of Response 
Continuous 
Simple Regression 
Multiple Regression 
One Independent Variable Used 
Multiple Independent Variable Used 
Only One Dependent Variable 
Only One Dependent Variable 
Relationships that are significant when using simple linear regression may no longer be when using multiple linear regression and viceversa.
Insignificant relationships in simple linear regression may become significant in multiple linear regression.
2. How Logistic Regression Works
2. Logistic Regression
Type of Response 
Categorical 
It can be used for explanatory tasks (=profiling) or predictive tasks (=classification) 
The predictors are related to the response Y via a nonlinear function called the logit 
Reducing predictors can be done via variable selection 
Types 
1. Binary Regression 
Two Categories. 
Example: Spam or Not 
2. Multinomial Logistic Regression 
Three or more categories. 
Example: Veg, NonVeg, Vegan 
3. Ordinal Logistic Regression 
Three or more categories 
Example: Movie rating from 1 to 5 


3. Naive Bayes Classifier
Type of Response 
Categorical 
Probabilistic machine learning model that’s used for classification task. 
The heart of the classifier is based on the Bayes theorem. 
Bayes theorem provides a way relating the likelihood of some outcome given some informative prior information. 
We can find the probability of A happening, given that B has occurred. 
B is the evidence and A is the hypothesis. That is presence of one particular feature does not affect the other. 
Bayes Theorem Probability Formula 
P(A/B) = (P(BA)*P(A))/P(B) 
Naive Bayes works well when there is a large number of predictor variables 
It also works when there are missing values. 
The probability estimates are not very accurate 
The classifications or predictions are generally accurate. 
Assumptions 
1. Predictors/features work independently on the target variable. 
2. All the predictors have an equal effect on the outcome. 
4. Neural Networks
Type of Response 
Both Categorical and Continuous (particularly useful) 
Learns complex patterns using layers of neurons which mathematically transform the data. 
The layers between the input and output are referred to as “hidden layers”. 
Learns relationships between the features that other algorithms cannot easily discover. 
Architecture of Neural Net 
Input Layer 
Nodes(variables) with information from the external environment 
Output Layer 
Nodes(variables) that send information to the external environment or to another element in the network 
Hidden Layer 
Nodes that only communicate with other layers of the network and are not visible to the external environment 


5. How Decision Trees Work
5. Different Types of Trees
5. How Ensemble Model Works
5. Decision Trees
The decision tree is produced by successively cutting the data set into smaller and smaller chunks, which are increasingly "pure" in terms of the value of the target variable. 
Random Forest  Ensemble Method 
Boosted Trees  Ensemble Method 
Consists of a large number of individual decision trees that operate as an ensemble 
Boosting is a method of converting weak learners into strong learners. 
Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction 
Boosted trees is the process of building a large, additive tree by fitting a sequence of smaller trees 
The predictions (and therefore the errors) made by the individual trees need to have low correlations with each other. 
In boosting, each new tree is a fit on a modified version of the original data set. 
Random Forests train each tree independently, using a random sample of the data. 
GBTs train one tree at a time, where each new tree helps to correct errors made by previously trained trees. 
6. KNearest Neighbors
Type of Response 
Both Categorical and Continuous 
KNN is method for classifying objects based on their similarity to a data with known classifications. 
KNearest Neighbors (KNN) makes a prediction for a new observation by searching for the most similar training observations and pooling (usually done by taking the mean average) their values 
Training set has to be very large for this to work effectively 
Redundant and/or irrelevant variables can distort the classification results; the method is sensitive to noise in the data. 
Nominal variables pose problems for measuring distance 
It is a nonparametric model ... does not require distribution assumptions regarding the variables and does not make statistical inferences to a population 
KNN is an example of a family of algorithms known as instancebased or memorybased learning that classify new objects by their similarity to previously known objects. 
