Show Menu
Cheatography

Economic Statistics - Midterm 2.2 Cheat Sheet (DRAFT) by

Cheat sheet — Probability, Random Variables, Sampling

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Probab­ility basics (defin­itions & rules)

Probab­ility Basics
What probab­ility actually means:
• Probab­ility is a number that tells you how likely something is to happen.
• It’s always between 0 and 1:
  ▸ 0 = impossible
  ▸ 1 = guaranteed
  ▸ 0.5 = 50% chance
You can think of probab­ility as the long-run frequency of something happening if you repeated it a bunch of times.
  Example: if you flip a fair coin 1,000 times, about half the flips will be heads → P(Head­s)=0.5.

Key probab­ility symbols

• Event: any outcome or collection of outcomes you’re interested in.
  Example: “Rolling an even number” on a die → that’s the event A = {2, 4, 6}.

• Sample space (S): all possible outcomes.
  Example: rolling a die →
  S = {1, 2, 3, 4, 5, 6}.

• P(A): “The probab­ility that event A happens.”
  Example: P(Even­)=3­/6=0.5.
 

Types of events

• Disjoint / mutually exclusive: can’t happen at the same time.
  Example: “Rolling a 3” and “Rolling a 4.”
• Indepe­ndent: one happening doesn’t change the chance of the other.
  Example: Coin toss and rolling a die — they don’t affect each other.

Example: “At least one”

“If there’s a 0.004 chance a test gives a false positive, what’s the probab­ility that at least one out of 200 tests is a false positive?”
This uses the complement rule — it’s easier to find the chance that none are false positives, then subtract from 1.
P(At least one)=1­−P(­None)
Each test has a 0.996 chance of being fine →
P(None­)=0.99­6^200
So P(At least one)=1­−0.9­96^200

Random Variables (RVs)

What they are
A random variable is just a number that represents the outcome of something random.
Example:
Toss a coin twice.
X = number of heads.
Possible X values: 0, 1, or 2.
We can describe all the possible values of X and how likely they are. That list is called a probab­ility distri­bution.
 

Probab­ility distri­bution

A table or formula showing:
Every possible value of X
The probab­ility of each value
Must satisfy:
  1. All probab­ilities are between 0 and 1.
  2. They add up to 1.

Mean (Expected Value)

The expected value (E[X]) or mean (μₓ) tells you the average outcome in the long run.
  E[X]=∑­(x×­P(X=x))
Example (from table above): E[X]=0­(0.2­5)­+1(­0.5­0)+­2(0.25)=1 → On average, 1 head per 2 flips.

Variance & Standard Deviation

Var(X)­=∑(­x−μ)² P(X=x) / SD(X)= √Var(X)
Variance = average of squared distances from the mean.
Standard deviation = average distance.
If SD is small → values are close to the mean.
 

Useful shortcuts

E[aX+b­]=a­E[X]+b
(Multiply and add constants outside the expect­ation.)
Var(aX­+b)=a² Var(X)
(Adding doesn’t affect spread, multip­lying stretches it.)
E[X+Y]­=E[­X]+E[Y]
• If X and Y indepe­ndent: Var(X+Y)=Var(X)+Var(Y)
• If correl­ated: Var(X+Y) = Var(X)+Var(Y)+2ρσxσy
(ρ = correl­ation between X and Y)

Example: Lottery

You buy a $1 ticket that pays $500 if you win (proba­bility = 1/1000). If you lose, you get $0.
E[X]=(­499­)(0.00­1)+­(−1­)(0.99­9)=−0.5
You lose 50¢ on average each play → expected loss.

Sampling & Sampling Distri­butions

Population vs Sample
Popula­tion: the entire group you care about (all students in a school).
Sample: the smaller group you actually measure (30 students).
We use samples to estimate the truth about popula­tions.

Types of samples

Simple Random Sample (SRS): every individual has an equal chance of being chosen.
Stratified sample: population divided into groups (strata) → random sample from each.
Cluster / multis­tage: randomly pick clusters, then pick within them.
⚠️ Voluntary response or conven­ience samples are biased! (not random → results unreli­able).

Sampling distri­bution

If you take many random samples and compute a statistic (like a mean or propor­tion) for each sample, the distri­bution of those statistics is the sampling distri­bution.
We study its:
Center: average (should equal population value if unbiased)
Spread: how much sample results vary
Shape: often becomes bell-s­haped for large samples

Unbiased estimator

A statistic is unbiased if its sampling distri­but­ion’s center equals the true population parameter.
E[X̄]= μ and E[p̂​]= p
Unbiased means, “on average, it hits the right answer.”

How sample size affects variab­ility

Bigger sample → smaller variab­ility (less spread).
SD(X̄)­=σ/√n
As n increases, denomi­nator gets bigger → SD gets smaller.

Law of Large Numbers & Central Limit Theorem

Law of Large Numbers
When you take more and more samples, the sample mean X̄ will get closer and closer to the population mean μ.
  Example: the average of 10 coin flips might not be 0.5, but the average of 10,000 flips almost definitely will be close to 0.5.

Central Limit Theorem (CLT)

This is super important for exams.
Even if the original population is not Normal, when you take a large enough sample, the distri­bution of sample means will look Normal (bell-­shaped).
  X̄ ∼N(μ, σ/√n)
Meaning:
Centered at μ (the true mean)
Spread = σ/√n

Why it matters

CLT lets you use z-scores and Normal tables to find probab­ilities for sample means.
*z = sample mean - μ / σ/√n)
Then use the z-table (or calcul­ator) to find probab­ili­ties.

Example (CLT in action)

A machine fills cereal boxes.
μ = 500g, σ = 10g.
Sample 25 boxes.
SD(X̄) = 10/√25 = 2
P(X̄ < 496) = P (Z < 496-500/2) = P (Z < -2) = 0.0228
→ 2.28% chance the sample average weight is below 496g.