Cheatography

# Machine Learning Model - Basics/Intermediate Cheat Sheet (DRAFT) by spriiprad

Machine Learning Model and Interpretation

This is a draft cheat sheet. It is a work in progress and is not finished yet.

### Supervised Vs Unsupe­rvised Learnig

 Supervised Unsupe­rvised Used in Classi­fic­ation and Prediction Dimension Reduction and clustering Value of outcome must be known No outcome variable to predict or classify Learns from training data and applied to validation No learning

### 1. Linear Regression

 Type of Response Continuous Simple Regression Multiple Regression One Indepe­ndent Variable Used Multiple Indepe­ndent Variable Used Only One Dependent Variable Only One Dependent Variable
Relati­onships that are signif­icant when using simple linear regression may no longer be when using multiple linear regression and vice-v­ersa.

Insign­ificant relati­onships in simple linear regression may become signif­icant in multiple linear regres­sion.

### 2. Logistic Regression

 Type of Response Catego­rical It can be used for explan­atory tasks (=prof­iling) or predictive tasks (=clas­sif­ica­tion) The predictors are related to the response Y via a nonlinear function called the logit Reducing predictors can be done via variable selection Types 1. Binary Regression Two Catego­ries. Example: Spam or Not 2. Multin­omial Logistic Regression Three or more catego­ries. Example: Veg, Non-Veg, Vegan 3. Ordinal Logistic Regression Three or more categories Example: Movie rating from 1 to 5

### 3. Naive Bayes Classifier

 Type of Response Catego­rical Probab­ilistic machine learning model that’s used for classi­fic­ation task. The heart of the classifier is based on the Bayes theorem. Bayes theorem provides a way relating the likelihood of some outcome given some inform­ative prior inform­ation. We can find the probab­ility of A happening, given that B has occurred. B is the evidence and A is the hypoth­esis. That is presence of one particular feature does not affect the other. Bayes Theorem Probab­ility Formula P(A/B) = (P(B|A­)*P­(A)­)/P(B) Naive Bayes works well when there is a large number of predictor variables It also works when there are missing values. The probab­ility estimates are not very accurate The classi­fic­ations or predic­tions are generally accurate. Assump­tions 1. Predic­tor­s/f­eatures work indepe­ndently on the target variable. 2. All the predictors have an equal effect on the outcome.

### 4. Neural Networks

 Type of Response Both Catego­rical and Continuous (parti­cularly useful) Learns complex patterns using layers of neurons which mathem­ati­cally transform the data. The layers between the input and output are referred to as “hidden layers”. Learns relati­onships between the features that other algorithms cannot easily discover. Archit­ecture of Neural Net Input Layer Nodes(­var­iables) with inform­ation from the external enviro­nment Output Layer Nodes(­var­iables) that send inform­ation to the external enviro­nment or to another element in the network Hidden Layer Nodes that only commun­icate with other layers of the network and are not visible to the external enviro­nment

### 5. Decision Trees

 The decision tree is produced by succes­sively cutting the data set into smaller and smaller chunks, which are increa­singly "­pur­e" in terms of the value of the target variable. Random Forest - Ensemble Method Boosted Trees - Ensemble Method Consists of a large number of individual decision trees that operate as an ensemble Boosting is a method of converting weak learners into strong learners. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction Boosted trees is the process of building a large, additive tree by fitting a sequence of smaller trees The predic­tions (and therefore the errors) made by the individual trees need to have low correl­ations with each other. In boosting, each new tree is a fit on a modified version of the original data set. Random Forests train each tree indepe­nde­ntly, using a random sample of the data. GBTs train one tree at a time, where each new tree helps to correct errors made by previously trained trees.

### 6. K-Nearest Neighbors

 Type of Response Both Catego­rical and Continuous KNN is method for classi­fying objects based on their similarity to a data with known classi­fic­ations. K-Nearest Neighbors (KNN) makes a prediction for a new observ­ation by searching for the most similar training observ­ations and pooling (usually done by taking the mean average) their values Training set has to be very large for this to work effect­ively Redundant and/or irrelevant variables can distort the classi­fic­ation results; the method is sensitive to noise in the data. Nominal variables pose problems for measuring distance It is a non-pa­ram­etric model ... does not require distri­bution assump­tions regarding the variables and does not make statis­tical inferences to a population KNN is an example of a family of algorithms known as instan­ce-­based or memory­-based learning that classify new objects by their similarity to previously known objects.