Show Menu
Cheatography

Foundation of Statistics Sec. 1 & 2 under Shirin Cheat Sheet (DRAFT) by

In order to assist those with a Computer Science (or otherwise unrelated to Statistics) background in the study and completion of Foundations of Statistics modules with Shirin.

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Basic Mathem­atical Symbols and Explan­ations

∑: Sum of the following set/set of values returned from a function. There's ususally a variable name and assignment underneath it, and a limit above - this means you are summing the values returned by the function with the value range from the bottom to to the top.
sum=0;
for(int i =0; i++; i<=10){
sum += do_fun­cti­on(i); }
∏: Product of the following set/set of values returned from a function. There's ususally a variable name and assignment underneath it, and a limit above - this means you are getting the product of all the values returned by the function with the value range from the bottom to to the top.
prod=0;
for(int i =0; i++; i<=10){
prod *= do_fun­cti­on(i); }
∀: For all/For every instance. Universal quantifier in predicate logic. ie. The stated holds true for every situation. Can be further expanded with ∀i (assuming i is defined in the previously stated function) followed by a subset or function which would read as "The stated holds true for all i in the following set/fu­nct­ion­"
ℝ: Real numbers. ie. Not imaginary numbers (sqrt of a negative number) and not infinity. Integers, negatives, floats, doubles etc. are all considered "Real number­s"
∫: Integral. Used for finding areas, volumes, central points etc. Not confident in my own summary, please follow this link https:­//w­ww.m­at­hsi­sfu­n.c­om/­cal­cul­us/­int­egr­ati­on-­int­rod­uct­ion.html
lim
a->x
: This is the function to find/d­efine the limit of possible values returned when a is fed into the following function ie.
lim
a->-∞
F(a)=0
Which means that the lowest value for F(a) where a gets as close to -∞ as possible (appro­aching from 0) is limited to 0. ie. Lower limit is 0. Can be used to define upper limits with +∞ and limits for discrete variables by specifying their unique upper and lower bounds ie.
lim
a->6
f(a)=1
where 0<=­a<=6

Defini­tions, Proper­ties, Rules and Laws

The Additivity Property
If A∩B = ∅ then P(A∪B) = P(A)+P(B)

If A∩B != ∅ then P(A∪B) = P(A)+P(B)-P(A∩B)

P(Ac)=1-P(A)
The Multip­lic­ation Rule
P(A∩B) = P(A|B)P(B)
The Law of Total Probab­ility
Given disjoint events B1,B2,...,Bm such that
mi=1 Bi = Ω
(ie. The union of all events B1 through Bm is the same as the entire sample space)
Then the probab­ility of a random­/ar­bitrary event A is expressed as...
P(A) = ∑mi=1 P(A|Bi)P(Bi)
(ie. The sum of probab­ilities of all events Bi where A occurs)
Bayes' Rule
Given disjoint events B1,B2,...,Bm and
mi=1 Bi = Ω
(ie. The union of all events B1 through Bm is the same as the entire sample space)
Then the condit­ional probab­ility of Bi, given that a random­/ar­bitrary event A occurs is...
P(Bi|A) = P(A|Bi­)P(­Bi)/∑mj=1P(A|Bj)P(Bj)
(ie. !!!VER­IFY­!!!THe probab­ility of Bi given that A occurs is the calculated by dividing the the probab­iltiy P(A∩B) <ac­cording to the multip­lic­ation rule> by the sum of the probab­ilities of A inters­ecting all other events in the sample space <ac­cording to the multip­lic­ation rules)>
Properties of the Probab­ility Mass Function aka pmf
All probab­ilities are positive: fx(x) ≥ 0.

Any event in the distri­bution (e.g. “scoring between 20 and 30”) has a probab­ility of happening of between 0 and 1 (e.g. 0% and 100%).

The sum of all probab­ilities is 100% (i.e. 1 as a decimal): Σfx(x) = 1.

An individual probab­ility is found by adding up the x-values in event A. P(X Ε A)=Σ
x∈A
f(X)
Properties of the Cumulative Distri­bution Function aka cdf
1. For a<=b then F(A)<=F(b) ie. if a<=b then the cdf of a<=the cdf of b.

2. F(a) is a probab­ility 0<=­F(a­)<=1, and
lim
a->+∞
F(a)=1
lim
a->-∞
F(a)=0
ie. F(a) will never return a result bigger than 1 or smaller than 0.

3. F is right-continuous
lim
b->0
F(a+b) = F(a).

*a<=b implies that the event {X<=a} is contai­ned­(subset of) the event {X<=b}
Properties of Expect­ation aka E(X)
E(aX) = aE(X) ∀ a is a constant
E(XY) = E(X)E(Y) when X and Y are independent
E(a+bX) = a+bE(X) linearity
E(X+Y) = E(X)+E(Y) linearity
E[Σ
i=1
nXi] = Σ
i=1
nE[Xi]
Properties of Variance aka V(X)
Vara(aX) = a2Var(X) ∀ a is a constant
Var(a+X) = Var(X) ∀ a is a constant
 

Probab­ili­ty/­Sta­tistics & Set Notation

P(A)
Probab­ility of event A occuring. (Number of ways event A can occur / Number of total outcomes possible))
Sample Space/­Uni­verse. P(Ω)=1
Empty/Null set
P(A∩B)
Probab­ility of A Inters­ection B
Disjoi­nt/­Ind­epe­nde­nt/­Mut­ually Exclusive
If A∩B = ∅ then disjoi­nt/­ind­epe­ndent of each other
P(A∪B)
If disjoi­nt/­ind­epe­ndent of one another P(A∪B) = P(A) + P(B)

If not disjoint P(A∪B) = P(A) + P(B) - P(A∩B)
Ac
A comple­ment. Everything outside A. P(Ac) = 1 - P(A)
A∈B / A∉B
A is an element of B / A is not an element of B
A: A ∈ B
A such that A is an element of B
n! aka Permut­ations
Counting method where ORDER matters. n! = n(n-1)­(n-­2)...(­n-k+1) where k = sample size
(nk) aka Combin­ations
Counting method where order does not matter. (nk) = n!/k!(­n-k)!)
P(A|B) aka Condit­ional Probab­ility
The Probab­ility of A happening, given that B occurs.

If A and B are disjoi­nt/­ind­epe­nde­nt/­mut­ually exclusive then P(A|B)­=P(A) as B has no effect on A.

If A and B are dependent ie. B has an effect on the chances of A the P(A|B) = P(A∩B)­/P(B)

P(A|B)+P(Ac|B)=1
P(Bi|A­)P(A) = P(A|Bi­)P(Bi)
Proven by the combin­ation of Bayes' rule and Law of total probab­ility applied to P(A)
Indepe­ndence of more than 2 events
Events A1,A2,...,Am are indepe­ndent if
P(∩mi=1Ai) = ∏mi=1P(Ai)
(ie. They are indepe­ndent events if the probab­ility of all of their inters­ections are equal to the product of all of their individual probabilities)

A and B are indepe­ndent. B and C are indepe­ndent. This does not mean that A and C are indepe­ndent, nor does it mean they must be dependent.
Random variable aka. rv
Any variable whose value is not known prior to the experiment and are subject to chance aka. Variab­ility aka. Change.
Has an associated probab­ility aka. mass

An rv is a type of mapping function over the whole sample space and is associated with measure theory. ie. An rv can transform the sample space.
Discrete
There is a set number of outcomes
Discrete Random Variable
Any function X: Ω→ℝ that takes on some value. eg. X could be S=sum or M=max ran on a sample space, getting the sum/max of each experiment outcome and constr­ucting a new sample space out of it.
Probab­ility Mass Function aka pmf
The pmf of some discrete rv X. Essent­ially creating a table/­graph displaying all the probab­ilities of all possible values our discrete rv can be. Please refer to "­Pro­perties of the Probab­ility Mass Function aka PMF for more detail­s."E­xpl­ained here http:/­/ww­w.s­tat­ist­ics­how­to.c­om­/pr­oba­bil­ity­-ma­ss-­fun­cti­on-pmf/
Cumulative Distri­bution Function aka cdf
The cdf of some discrete rv can be used to determine the probab­ility above, below and between values occuring. Please refer to "­Pro­perties of the Cumulative Distri­bution Function aka CDF for more detail­s." Explained here http:/­/ww­w.s­tat­ist­ics­how­to.c­om­/cu­mul­ati­ve-­dis­tri­but­ion­-fu­nction/
Continuous
An infinite number of possible values.
Continuous Random Variables
Is a function X: Ω→ℝ that takes on any value a∈ℝ
Mass/Associated probab­ility no longer considered for each possible value of X instead consider the likelihood that X∈(a,b) for a<b.
Probab­ility Density Function aka pdf
Pdf on a continuous rv f(x) of X is an integrable function such that...
P(a<=X<=b) = ∫b
a
f(x)dx
ie. it is the area under the cure between points a and b. Therefore it is the probab­ility of a range of values occuring s.t. conditions on f
f(x)>=0 ∀x∈Ω
-∞
f(x)dx=1 ie. the complete area under the curve contains all outcomes.

This is defined by the formula...
F(x)=∫x
-∞
f(u)du = P(X<=x)
Expect­ation aka. E(x)
The expected value of a random variable This is found using the formula when our rv is discrete
E(X) = Σ
xi∈Ω
xi p(xi)

and the following formula when the rv is continuous
E(X)=∫
x f(x)dx

To make this easier to understand The expected value is simply the mean and is calculated as the sum of (each possible value muilti­plied by it's indepe­ndent probab­ility) ie The sum of weighted values to probab­ilities
Variance aka Var(X)
A method of measuring how far the actual value of a rv may be from the expected value. Given a discrete variable X the formula is..
Var(X)=Σ
xi∈Ω
x2i p(xi) - (Σ
xi∈Ω
xi p(xi))2

Or given a continuous rv use the formula
Var(X)=∫
x2 f(x)dx - (∫
x f(x)dx)2

In other words we sum up (the squared value's multiplied by their individual probab­ili­ties) and finally deduct the the squared expected value.
Standard Deviation
Another method similar to variance about looking at how far distri­bution goes from the mean ie. The actual value vs the expected value. Simply calculated with the sqrt(V­ar(X)). Benefit of this is that it is expressed in the same unit that X is expressed in rather than the squared as variance is.