Show Menu

Google Machine Learning Crash Course Cheat Sheet (DRAFT) by

Machine Learning Crash Course Cheat Sheet. Course Link:

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Machine Learning Termin­ology

Label is variable we’re predic­ting. Repres­ented by y.
Features are input variables describing data. Repres­ented by the variables {x1,x2,…,xn }
Example is a particular instance of data, x
- Labeled example has {features, label}: (x,y)
   Used to train the model.
- Unlabeled example has {featu­res,?}: (x, ?)
 ­ ­ Used for making predic­tions on new data.
Model maps examples to predicted labels: y’. Defined by internal parame­ters, which are learned.
Training means creating or learning the model. You show the model labeled examples and enable the model to learn the relati­onship between features and label.
Inference means applying the trained model to unlabeled examples. You use the trained model to make useful predic­tions (y’).
Regression model predicts continuous values.
For example; What is the value of a house in Califo­rnia?
Classification model predicts discrete values.
For example; Is a given e mail message spam or not spam?
Hyperp­ara­meters are the knobs that progra­mmers tweak in machine learning algori­thms.

Model and Equation

Equation for a model in machine learning;
y' is the predicted label.
b is the bias, also referred to as w0.
w1 is the weight.
x1 is a feature (a known input).
Some models have multiple features. For example, a model relies on three features look as follows;

Training and Loss

Training a model means learning values for all the weights and bias from labeled examples.
Loss is a number indicating how bad the model’s prediction on a single example. The goal of training a model is to find a set of weights and biases that have low loss, on average, across all examples.
Mean square error (MSE) is the average squared loss per example over the whole dataset.
MSE=1/N ∑(x,y)∈D (y-pre­dic­tio­n(x))2
  - x is set of features.
  - y is example’s label.
  - predic­tion(x) is function of the weights and bias of features of x.
  - D is data set containing labeled examples.
  - N is the number of examples in D.

Reducing Loss

Reducing Loss

Learning continues iterating until the algorithm discovers the model parameters with the lowest possible loss. Usually, until overall loss stops changing or at least changes extremely slowly. When that happens, we say that the model has converged.
Gradient descent algorithm calculates the gradient of the loss curve. When there is single weight, gradient of the loss is the derivative (slope) of the curve, When there are multiple weights, the gradient is a vector of partial deriva­tives with respect to the weights.
Gradient is a vector, so it has both of the following charac­ter­istics; a direction and a magnitude
The gradient always points in the direction of steepest increase. The gradient descent algorithm takes a step in the direction of the negative gradient in order to reduce loss as quickly as possible.

Gradient Descent