Show Menu
Cheatography

Machine Learning Model - Basics/Intermediate Cheat Sheet (DRAFT) by

Machine Learning Model and Interpretation

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Supervised Vs Unsupe­rvised Learnig

Supervised
Unsupe­rvised
Used in Classi­fic­ation and Prediction
Dimension Reduction and clustering
Value of outcome must be known
No outcome variable to predict or classify
Learns from training data and applied to validation
No learning

How Supervised Learning Looks

How Unsupe­rvised Learning Looks

Supervised vs Unsupe­rvised TLDR

1. Linear Regression

Type of Response
Continuous
Simple Regression
Multiple Regression
One Indepe­ndent Variable Used
Multiple Indepe­ndent Variable Used
Only One Dependent Variable
Only One Dependent Variable
Relati­onships that are signif­icant when using simple linear regression may no longer be when using multiple linear regression and vice-v­ersa.

Insign­ificant relati­onships in simple linear regression may become signif­icant in multiple linear regres­sion.

2. How Logistic Regression Works

2. Logistic Regression

Type of Response
Catego­rical
It can be used for explan­atory tasks (=prof­iling) or predictive tasks (=clas­sif­ica­tion)
The predictors are related to the response Y via a nonlinear function called the logit
Reducing predictors can be done via variable selection
Types
1. Binary Regression
Two Catego­ries.
Example: Spam or Not
2. Multin­omial Logistic Regression
Three or more catego­ries.
Example: Veg, Non-Veg, Vegan
3. Ordinal Logistic Regression
Three or more categories
Example: Movie rating from 1 to 5
 

3. How Naive Bayes Work

3. Naive Bayes Classifier

Type of Response
Catego­rical
Probab­ilistic machine learning model that’s used for classi­fic­ation task.
The heart of the classifier is based on the Bayes theorem.
Bayes theorem provides a way relating the likelihood of some outcome given some inform­ative prior inform­ation.
We can find the probab­ility of A happening, given that B has occurred.
B is the evidence and A is the hypoth­esis. That is presence of one particular feature does not affect the other.
Bayes Theorem Probab­ility Formula
P(A/B) = (P(B|A­)*P­(A)­)/P(B)
Naive Bayes works well when there is a large number of predictor variables
It also works when there are missing values.
The probab­ility estimates are not very accurate
The classi­fic­ations or predic­tions are generally accurate.
Assump­tions
1. Predic­tor­s/f­eatures work indepe­ndently on the target variable.
2. All the predictors have an equal effect on the outcome.

4. How Neural Net Works

4. Neural Networks

Type of Response
Both Catego­rical and Continuous (parti­cularly useful)
Learns complex patterns using layers of neurons which mathem­ati­cally transform the data.
The layers between the input and output are referred to as “hidden layers”.
Learns relati­onships between the features that other algorithms cannot easily discover.
Archit­ecture of Neural Net
Input Layer
Nodes(­var­iables) with inform­ation from the external enviro­nment
Output Layer
Nodes(­var­iables) that send inform­ation to the external enviro­nment or to another element in the network
Hidden Layer
Nodes that only commun­icate with other layers of the network and are not visible to the external enviro­nment
 

5. How Decision Trees Work

5. Different Types of Trees

5. How Ensemble Model Works

5. Decision Trees

The decision tree is produced by succes­sively cutting the data set into smaller and smaller chunks, which are increa­singly "­pur­e" in terms of the value of the target variable.
Random Forest - Ensemble Method
Boosted Trees - Ensemble Method
Consists of a large number of individual decision trees that operate as an ensemble
Boosting is a method of converting weak learners into strong learners.
Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction
Boosted trees is the process of building a large, additive tree by fitting a sequence of smaller trees
The predic­tions (and therefore the errors) made by the individual trees need to have low correl­ations with each other.
In boosting, each new tree is a fit on a modified version of the original data set.
Random Forests train each tree indepe­nde­ntly, using a random sample of the data.
GBTs train one tree at a time, where each new tree helps to correct errors made by previously trained trees.

6. How KNN works

6. K-Nearest Neighbors

Type of Response
Both Catego­rical and Continuous
KNN is method for classi­fying objects based on their similarity to a data with known classi­fic­ations.
K-Nearest Neighbors (KNN) makes a prediction for a new observ­ation by searching for the most similar training observ­ations and pooling (usually done by taking the mean average) their values
Training set has to be very large for this to work effect­ively
Redundant and/or irrelevant variables can distort the classi­fic­ation results; the method is sensitive to noise in the data.
Nominal variables pose problems for measuring distance
It is a non-pa­ram­etric model ... does not require distri­bution assump­tions regarding the variables and does not make statis­tical inferences to a population
KNN is an example of a family of algorithms known as instan­ce-­based or memory­-based learning that classify new objects by their similarity to previously known objects.