Statistics 
Machine learning 
Notes 
data point, record, row of data 
example, instance 
Both domains also use "observation," which can refer to a single measurement or an entire vector of attributes depending on context. 
response variable, dependent variable 
label, output 
Both domains also use "target." Since practically all variables depend on other variables, the term "dependent variable" is potentially misleading. 
variable, covariate, predictor, independent variable 

The term "independent variable" exists for historical reasons but is usually misleadingsuch a variable typically depends on other variables in the model. 
regressions 
supervised learners, machines 
Both estimate output(s) in terms of input(s). 
estimation 
learning 
Both translate data into quantitative claims, becoming more accurate as the supply of relevant data increases. 
hypothesis ≠ classifier 
hypothesis 
In both statistics and ML, a hypothesis is a scientific statement to be scrutinized, such as "The true value of this parameter is zero." In ML (but not in statistics), a hypothesis can also refer to the prediction rule that is output by a classifier algorithm. 
bias ≠ regression intercept 
bias 

Maximize the likelihood to estimate model parameters 
If your target distribution is discrete (such as in logistic regression), minimize the entropy to derive the best parameters.
If your target distribution is continuous, fine, just maximize the likelihood. 
For discrete distributions, maximizing the likelihood is equivalent to minimizing the entropy. 


The principle of maximum entropy is conceptual and does not refer to maximizing a concrete objective function. The principle is that models should be conservative in the sense that they be no more confident in the predictions than is thoroughly justified by the data. In practice this works out as deriving an estimation procedure in terms of a bareminimum set of criteria as exemplified here or here. 
logistic/multinomial regression 
maximum entropy, MaxEnt 
They are equivalent except in special multinomial settings like ordinal logistic regression. Note that maximum entropy here refers to the principle of maximum entropy, not the form of the objective function. Indeed, in MaxEnt, you minimize rather than maximize the entropy expression. 
X causes Y if surgical (or randomized controlled) manipulations in X are correlated with changes in Y 

The stats definition is more aligned with commonsense intuition than the ML one proposed here. In fairness, not all ML practitioners are so abusive of causation terminology, and some of the blame belongs with even earlier abuses such as Granger causality. 
structural equations model 
Bayesian network 
These are nearly equivalent mathematically, although interpretations differ by use case, as discussed. 
sequential experimental design 
active learning, reinforcement learning, hyperparameter optimization 
Although these four subfields are very different from each other in terms of their standard use cases, they all address problems of optimization via a sequence of queries/experiments. 
Created By
Metadata
Comments
No comments yet. Add yours below!
Add a Comment
Related Cheat Sheets