Cheatography
https://cheatography.com
Machine Learning Model and Interpretation
This is a draft cheat sheet. It is a work in progress and is not finished yet.
Supervised Vs Unsupervised LearnigSupervised  Unsupervised  Used in Classification and Prediction  Dimension Reduction and clustering  Value of outcome must be known  No outcome variable to predict or classify  Learns from training data and applied to validation  No learning 
How Supervised Learning Looks
How Unsupervised Learning Looks
Supervised vs Unsupervised TLDR
1. Linear RegressionType of Response  Continuous  Simple Regression  Multiple Regression  One Independent Variable Used  Multiple Independent Variable Used  Only One Dependent Variable  Only One Dependent Variable 
Relationships that are significant when using simple linear regression may no longer be when using multiple linear regression and viceversa.
Insignificant relationships in simple linear regression may become significant in multiple linear regression.
2. How Logistic Regression Works
2. Logistic RegressionType of Response  Categorical  It can be used for explanatory tasks (=profiling) or predictive tasks (=classification)  The predictors are related to the response Y via a nonlinear function called the logit  Reducing predictors can be done via variable selection  Types  1. Binary Regression  Two Categories.  Example: Spam or Not  2. Multinomial Logistic Regression  Three or more categories.  Example: Veg, NonVeg, Vegan  3. Ordinal Logistic Regression  Three or more categories  Example: Movie rating from 1 to 5 
  3. Naive Bayes ClassifierType of Response  Categorical  Probabilistic machine learning model that’s used for classification task.  The heart of the classifier is based on the Bayes theorem.  Bayes theorem provides a way relating the likelihood of some outcome given some informative prior information.  We can find the probability of A happening, given that B has occurred.  B is the evidence and A is the hypothesis. That is presence of one particular feature does not affect the other.  Bayes Theorem Probability Formula  P(A/B) = (P(BA)*P(A))/P(B)  Naive Bayes works well when there is a large number of predictor variables  It also works when there are missing values.  The probability estimates are not very accurate  The classifications or predictions are generally accurate.  Assumptions  1. Predictors/features work independently on the target variable.  2. All the predictors have an equal effect on the outcome. 
4. Neural NetworksType of Response  Both Categorical and Continuous (particularly useful)  Learns complex patterns using layers of neurons which mathematically transform the data.  The layers between the input and output are referred to as “hidden layers”.  Learns relationships between the features that other algorithms cannot easily discover.  Architecture of Neural Net  Input Layer  Nodes(variables) with information from the external environment  Output Layer  Nodes(variables) that send information to the external environment or to another element in the network  Hidden Layer  Nodes that only communicate with other layers of the network and are not visible to the external environment 
  5. How Decision Trees Work
5. Different Types of Trees
5. How Ensemble Model Works
5. Decision TreesThe decision tree is produced by successively cutting the data set into smaller and smaller chunks, which are increasingly "pure" in terms of the value of the target variable.  Random Forest  Ensemble Method  Boosted Trees  Ensemble Method  Consists of a large number of individual decision trees that operate as an ensemble  Boosting is a method of converting weak learners into strong learners.  Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction  Boosted trees is the process of building a large, additive tree by fitting a sequence of smaller trees  The predictions (and therefore the errors) made by the individual trees need to have low correlations with each other.  In boosting, each new tree is a fit on a modified version of the original data set.  Random Forests train each tree independently, using a random sample of the data.  GBTs train one tree at a time, where each new tree helps to correct errors made by previously trained trees. 
6. KNearest NeighborsType of Response  Both Categorical and Continuous  KNN is method for classifying objects based on their similarity to a data with known classifications.  KNearest Neighbors (KNN) makes a prediction for a new observation by searching for the most similar training observations and pooling (usually done by taking the mean average) their values  Training set has to be very large for this to work effectively  Redundant and/or irrelevant variables can distort the classification results; the method is sensitive to noise in the data.  Nominal variables pose problems for measuring distance  It is a nonparametric model ... does not require distribution assumptions regarding the variables and does not make statistical inferences to a population  KNN is an example of a family of algorithms known as instancebased or memorybased learning that classify new objects by their similarity to previously known objects. 
