Statistics & Probability Notes
IQR = Q3 = Q1
Outliers are beyond: Q1 - 1.5 * IQR, Q3 + 1.5 + IQR
Calc Entry for Basic Stats
Correlation (R^2 near 1 is better fit)
4-Digit PIN with repetition = 10^4
4-Digit PIN without repetition = 10!/6! = 5040
How many ways can 6 people be ranked? Ranking n objects leads to n!
How many ways can 6 people be ranked into 3 places:
n(n-1)...*(n-r+1) = n!/(n-r)! or 6 nPr 3
Combinations = choosing a certain number of objects from a given set (no order).
N choose R or N!/(N-R)!R! or nCr
3 cities from 15 are chosen randomly for a visit. A: 4 cost $800, B: 5 cost $300, C: 6 cost $100. What is the probability that the tour will cost $1000 or less?
- All from C 6 nCr 3
- Two from C, one from B or A (6nCr2)(5nCr1) + (6nCr2)(4nCr1)
- One from C, two from B (6nCr1)*((5nCr2)
- None from C, three from B (5nCr3)
Compute and sum: 20+75+60+60+30=245; Divide by the total 15nCr3 = 455
.538 or 53.8%
Example of Random Variable
X is the number of heads in 10 flips of a coin. P(X=4)?
(10nCr4)/2^10 = .205
X is sum of two dice.
P(X=5) = 4/36
P(9<=X<=11) = P(X=9)+P(X=10)+P(X=11)= 4/36+3/36+2/36
Binomial Random Variables
Binomial Random Variable represents the number of successes in n trials with probability p of success. The probability of 4 successes in 10 trials (p=0.5) is (10 nCr4_/2^10 = 0.205
Calc: binompdf(10,.5,8) = (n,p,r) r is number of successes
Math: (n r) pr(1-p)n-r
Binomial Random Variables
Binomial with n=15 and p = .4
Continuous & Normal Random Variable
For C RV, P(X=3) or any other value is zero.
Probability is area under curve.
Normal RV is bell curve.
Total area under bell is 1.
P(X<a) is the are under the curve up to x=a.
1 Std. Dev. P(-1<Z<1)= .683
2 Std. Dev. P(-2<Z<2)=.954
3 Std. Dev. P(-3<Z<3)=.997
X is a random normal variable with mean -3 and standard deviation 0.7
P(-4<X<-3) = normalcdf(-4,-3,-3,.7)
P(X>-2) = normalcdf(-2,1E99,-3,.7)
P(|X-(-3)|>.7)=P(X<-3.7)+P(X>-2.3) = normalcdf(-1E99,-3.7,-3,.7)+normalcdf(-2.3,1E99,-3,.7)
A car model gets 24 mpg on the car sticker. The maker knows that this is normally distributed with a std dev of 3 mpg. What is the proportion of cars that get less than 20 mpg? P(X,20)=normalcdf(-1E99,20,24,3)=0.91
P(A|B) is the probability of A given that B happened.
P(A|B) = P(AintersectB) /P(B)
If P(A|B) = P(A) then independent.
P(AintersectB) = P(A)P(B)
If mutually exclusive P(A|B)=0
Central Limit Theorem
As n becomes large, the sample mean will be distributed according to the normal distribution with parameters u and standard deviation - std dev/sqrt n
*As n gets large, the spread in the sample mean distribution narrows. This means that the sample mean is more likely to be near the true mean.
Z-test approximated by normal distribution. If sample size is large or variance is known.
T-score/test is used when:
- sample size is below 30
- population standard deviation is unknown (estimated from your sample data)
otherwise use z-score/test.
Generally use 95% confidence level.
Z-Stat represents how many std. deviations away from the mean the sample mean is.
Std. dev is std dev/sqrt n
Null hypotheses assumes that whatever you are trying to prove did not happen.
p-value of 0.03 means there is a 3% chance of finding a difference as large as or larger than the one in your study given the null hypothesis is true.
If 0.05 or less you typically do not accept the null hypothesis.
Type 1 error: rejecting the null hypothesis when true
Type 2 error: accepting the null hypothesis when false
Derivatives and Tangent Lines
The derivative of a function f at x is the slope of the tangent at x. If all of the slopes are assembled you get f'(x) or df/dx.
If we know f'(a) and f(a), the the tangent line at x=a is y=f'(a)(x-a)+f(a)
Slope is f'(a) and line passes through (a,f(a))
Approximate slope use: nDeriv
Computing Areas & Integrals
The definite integral of f from a to b is the area underneath the curve from a to b.
Where f is negative, the area contributed is a negative area.
Fundamental Theorem of Calculus
A function f(x) converges to a limit L at x=a if, for any given error tolerance, we can specify a range of x such that for any x in that range, f(x) is near L, near being given the tolerance.
Min: f goes from decreasing to increasing
f' goes from negative to positive.
Max: f goes from increasing to decreasing.
f' goes from positive to negative.
Flat: f continues to change in the same way.
f' does not change sign.
f" gives concavity.
Concave up means second derivative is positive which means first derivative is increasing
Concave down: f"<0, f' decreasing.
Max/Min Word Problems
10 meters of string. maximum area dimensions?
Perimeter: P(l,w) = 2l_2w P=10
max is at l=2.5
1. Pick a, initial guess.
2. Compute tangent line approximation: y = f(a)+f'(a)(x-a)
3. Solve y=0 and get x = (f'(a)a-f(a))/f'(a)
4. Use x for the next guess. Repeat.