Show Menu

Translate_Stats_ML Cheat Sheet by

Translating Between Statistics and Machine Learning


Machine learning
data point, record, row of data
example, instance
Both domains also use "­obs­erv­ati­on,­" which can refer to a single measur­ement or an entire vector of attributes depending on context.
response variable, dependent variable
label, output
Both domains also use "­tar­get." Since practi­cally all variables depend on other variables, the term "­dep­endent variab­le" is potent­ially mislea­ding.
variable, covariate, predictor, indepe­ndent variable
feature, side information, input
The term "­ind­epe­ndent variab­le" exists for historical reasons but is usually mislea­din­g--such a variable typically depends on other variables in the model.
supervised learners, machines
Both estimate output(s) in terms of input(s).
Both translate data into quanti­tative claims, becoming more accurate as the supply of relevant data increases.
hypothesis ≠ classifier
In both statistics and ML, a hypothesis is a scientific statement to be scruti­nized, such as "The true value of this parameter is zero."

In ML (but not in statis­tics), a hypothesis can also refer to the prediction rule that is output by a classifier algorithm.
bias ≠ regression intercept
Statistics distin­guishes between
(a) bias as form of estimation error and
(b) the default prediction of a linear model in the special case where all inputs are 0.
ML sometimes uses "­bia­s" to refer to both of these concepts, although the best ML resear­chers certainly understand the differ­ence.
Maximize the likelihood to estimate model parameters
If your target distri­bution is discrete (such as in logistic regres­sion), minimize the entropy to derive the best parame­ters.

If your target distri­bution is contin­uous, fine, just maximize the likeli­hood.
For discrete distri­but­ions, maximizing the likelihood is equivalent to minimizing the entropy.
Apply Occam's razor, or encode missing prior inform­ation with suitably uninfo­rmative priors
The principle of maximum entropy is conceptual and does not refer to maximizing a concrete objective function. The principle is that models should be conser­vative in the sense that they be no more confident in the predic­tions than is thoroughly justified by the data. In practice this works out as deriving an estimation procedure in terms of a bare-m­inimum set of criteria as exempl­ified here or here.
logist­ic/­mul­tin­omial regression
maximum entropy, MaxEnt
They are equivalent except in special multin­omial settings like ordinal logistic regres­sion. Note that maximum entropy here refers to the principle of maximum entropy, not the form of the objective function. Indeed, in MaxEnt, you minimize rather than maximize the entropy expres­sion.
X causes Y if surgical (or randomized contro­lled) manipu­lations in X are correlated with changes in Y
X causes Y if it doesn't obviously not cause Y. For example, X causes Y if X precedes Y in time (or is at least contemporaneous)
The stats definition is more aligned with common­-sense intuition than the ML one proposed here. In fairness, not all ML practi­tioners are so abusive of causation termin­ology, and some of the blame belongs with even earlier abuses such as Granger causality.
structural equations model
Bayesian network
These are nearly equivalent mathem­ati­cally, although interp­ret­ations differ by use case, as discussed.
sequential experi­mental design
active learning, reinfo­rcement learning, hyperp­ara­meter optimi­zation
Although these four subfields are very different from each other in terms of their standard use cases, they all address problems of optimi­zation via a sequence of querie­s/e­xpe­rim­ents.


No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

          Weights and Measures Cheat Sheet
            Theano Cheat Sheet Cheat Sheet by DataCamp
            Keras Cheat Sheet: Neural Networks in Python Cheat Sheet by DataCamp