Statistics & Probability Notes
QuartilesIQR = Q3 = Q1
Outliers are beyond: Q1  1.5 * IQR, Q3 + 1.5 + IQR 
Calc Entry for Basic Stats
Correlation (R^2 near 1 is better fit)
>COMBINATORIAL PROBABILITY<4Digit PIN with repetition = 10^4
4Digit PIN without repetition = 10!/6! = 5040 
PermutationsHow many ways can 6 people be ranked? Ranking n objects leads to n!
How many ways can 6 people be ranked into 3 places:
n(n1)...*(nr+1) = n!/(nr)! or 6 nPr 3 
CombinationsCombinations = choosing a certain number of objects from a given set (no order).
N choose R or N!/(NR)!R! or nCr 
Example3 cities from 15 are chosen randomly for a visit. A: 4 cost $800, B: 5 cost $300, C: 6 cost $100. What is the probability that the tour will cost $1000 or less?
 All from C 6 nCr 3
 Two from C, one from B or A (6nCr2)(5nCr1) + (6nCr2)(4nCr1)
 One from C, two from B (6nCr1)*((5nCr2)
 None from C, three from B (5nCr3)
Compute and sum: 20+75+60+60+30=245; Divide by the total 15nCr3 = 455
.538 or 53.8% 
>RANDOM VARIABLES<Example of Random Variable
X is the number of heads in 10 flips of a coin. P(X=4)?
(10nCr4)/2^10 = .205
X is sum of two dice.
P(X=5) = 4/36
P(9<=X<=11) = P(X=9)+P(X=10)+P(X=11)= 4/36+3/36+2/36 
Binomial Random VariablesBinomial Random Variable represents the number of successes in n trials with probability p of success. The probability of 4 successes in 10 trials (p=0.5) is (10 nCr4_/2^10 = 0.205
Calc: binompdf(10,.5,8) = (n,p,r) r is number of successes
Math: (n r) p^{r(1p)}nr 
Binomial Random VariablesBinomial with n=15 and p = .4
Compute:
P(X=3)=binompdf(15,.4,3)
P(X<=3)=binomcdf(15,.4,3)
P(X<3)=binomcdf(15,.4,2)
P(X>3)=1binomcdf(15,.4,3)
P(X>=3)=1binomcdf(15,.4,2)
P(4<x<=8)=binomcdf(15,.4,8)binomcdf(15,.4,4)
P(1.3<X<1.7)=0 
Continuous & Normal Random VariableFor C RV, P(X=3) or any other value is zero.
Probability is area under curve.
Normal RV is bell curve.
Total area under bell is 1.
P(X<a) is the are under the curve up to x=a.
1 Std. Dev. P(1<Z<1)= .683
2 Std. Dev. P(2<Z<2)=.954
3 Std. Dev. P(3<Z<3)=.997 
Pure NumbersX is a random normal variable with mean 3 and standard deviation 0.7
P(4<X<3) = normalcdf(4,3,3,.7)
P(X>2) = normalcdf(2,1E99,3,.7)
P(X<=3.5)=normalcdf(1e((,3.5,3,.7)
P(X=3) =0
P(X(3)>.7)=P(X<3.7)+P(X>2.3) = normalcdf(1E99,3.7,3,.7)+normalcdf(2.3,1E99,3,.7)
A car model gets 24 mpg on the car sticker. The maker knows that this is normally distributed with a std dev of 3 mpg. What is the proportion of cars that get less than 20 mpg? P(X,20)=normalcdf(1E99,20,24,3)=0.91 
Conditional ProbabilityP(AB) is the probability of A given that B happened.
P(AB) = P(AintersectB) /P(B)
If P(AB) = P(A) then independent.
P(AintersectB) = P(A)P(B)
If mutually exclusive P(AB)=0 
Central Limit TheoremAs n becomes large, the sample mean will be distributed according to the normal distribution with parameters u and standard deviation  std dev/sqrt n
*As n gets large, the spread in the sample mean distribution narrows. This means that the sample mean is more likely to be near the true mean. 
>INFERENTIAL STATISTICS<Ztest approximated by normal distribution. If sample size is large or variance is known.
Tscore/test is used when:
 sample size is below 30
 population standard deviation is unknown (estimated from your sample data)
otherwise use zscore/test.
Generally use 95% confidence level.
ZStat represents how many std. deviations away from the mean the sample mean is.
Std. dev is std dev/sqrt n
Null hypotheses assumes that whatever you are trying to prove did not happen.
pvalue of 0.03 means there is a 3% chance of finding a difference as large as or larger than the one in your study given the null hypothesis is true.
If 0.05 or less you typically do not accept the null hypothesis.
Type 1 error: rejecting the null hypothesis when true
Type 2 error: accepting the null hypothesis when false 
  Derivatives and Tangent LinesThe derivative of a function f at x is the slope of the tangent at x. If all of the slopes are assembled you get f'(x) or df/dx.
If we know f'(a) and f(a), the the tangent line at x=a is y=f'(a)(xa)+f(a)
Slope is f'(a) and line passes through (a,f(a))
Approximate slope use: nDeriv 
Computing Areas & IntegralsThe definite integral of f from a to b is the area underneath the curve from a to b.
Where f is negative, the area contributed is a negative area.
Use fnInt 
Fundamental Theorem of Calculus
LimitsA function f(x) converges to a limit L at x=a if, for any given error tolerance, we can specify a range of x such that for any x in that range, f(x) is near L, near being given the tolerance. 
Critical PointsMin: f goes from decreasing to increasing
f' goes from negative to positive.
Max: f goes from increasing to decreasing.
f' goes from positive to negative.
Flat: f continues to change in the same way.
f' does not change sign.
f" gives concavity.
Concave up means second derivative is positive which means first derivative is increasing
Concave down: f"<0, f' decreasing. 
Max/Min Word Problems10 meters of string. maximum area dimensions?
Perimeter: P(l,w) = 2l_2w P=10
Area: A9l,w)=lw
a(l)=l(51)
max is at l=2.5 
Newton's Method1. Pick a, initial guess.
2. Compute tangent line approximation: y = f(a)+f'(a)(xa)
3. Solve y=0 and get x = (f'(a)af(a))/f'(a)
4. Use x for the next guess. Repeat. 

Created By
Metadata
Comments
abbottmd, 17:49 25 Dec 15
Nice sheet. I just wanted to clarify if your standard deviation formula is really the variance. You would take the square root of variance to get sigma which is standard deviation.
Add a Comment
More Cheat Sheets by rockcollector2