Cheatography

# SOCI 271 Cheat Sheet by clarekirk

Social Statistics - Midterm 2

### Probab­ility and Infere­ntial Statistics

 Parameter A number you derive from a population Statistic A number you derive from a sample Census A survey of the whole population

### Probab­ility & Non-Pr­oba­bility Samples

 Probab­ility Samples Every case in the population has the same chance of being selected Non-Pr­oba­bility Samples A specific group is being used as your sample. Surveying students enrolled in a class

### Example

 We want to know what % of students work during the semester. We draw a sample of 500 from a list of all students at the university N = 20,000 (all students at univer­sity) P = 500/20,000 Use a table of random numbers to selected 500 ID numbers with 6 digits 6 digits will be chosen 500 times until they match up with student numbers After questi­oning each of these 500 students, we find that 368 (74%) work during the semester. Population – 20,000 Sample – 500 Statistic – 74% Parameter – Doesn't directly appear (it's implicit) (% of all students in the population who held a job)

### Sampling Variation

 Sample Statistics Variables (e.g., sample mean, sample propor­tion) Sampling Error The sample will differ from the population purely by chance Positive Sampling Error Making the statistic exceed the population Negative Sampling Error Making the statistic less than the population parameter
Sample statistic = population parameter + sampling error

Sampling Distri­bution
The theore­tical, probab­ilistic distri­bution of a statistic for all possible samples of a given size (n).

### Constr­uction of a Sampling Distri­bution

Statistic is used to estimate a parameter.
Not all statistics will have the same value.
What is the distri­bution of the values that we can get for the statistic?

Standard Error = population standard error / square root of the population size

### Practice Question

 The average age for a population of doctors in a hospital is 51.6 years, What does this mean value represent? A parameter What does it mean for a sample to be repres­ent­ative The sample reproduces the important charac­ter­istics of the population Which set of symbols represents the standard deviation of the sampling distri­bution? Which of these terms is synonymous with the standard error of the mean? The standard deviation of a sampling distri­bution

### Two Estimation Procedures

 Point Estimate A sample statistic used to estimate a population parameter Confidence Intervals Consist of a range of values instead of a single point
Example of point estimate:
50% of Canadians drive less because of gas.

Example of confid­ence:
Between 47% and 53% of Canadian drivers drive less due to high gas prices.

Confidence Intervals
- Point estimate is in the middle
- Lower and upper bound of C.I: 47% and 53%
- Margin of Error: radius or spread of the confidence interval (3%)

### Criteria for Choosing Estimators

 Bias An estimator is unbiased if the mean of its sampling distri­bution is equal to the population value of interest Efficiency The extent to which the sampling distri­bution is clustered around its mean

### Bias

If n is large, we know that the sample mean/p­rop­ortion is equal to the population parameter and: (image)

Very good (68 out of 100 chances) that our sample outcome is within +/- 1 standard deviation of the true population parameter

Excellent (95 out of 100) that it is within +/- 3 standard deviations

In less than 1% of cases, a sample outcome will lie further away than +/- 3 standard deviations

### Efficiency

Getting back to the matter of disper­sion: standard error σx̄ (standard deviation of the sampling distri­bution) = σ/(√n)

Standard error is an inverse function of n: as sample size increases, σx̄ will decrease

The smaller the standard deviation of a sampling distri­bution, the greater the clustering and the higher the effici­ency.

### Constr­ucting Confidence Intervals

 1. Set the alpha, a 2. Find the Z score (or critical value) associated with alpha 3. Construct the confidence interval (we will substitute values into the approp­riate formulas for confidence interval)

### Constr­ucting Confidence Intervals - Set the Alpha

 1. Alpha = the probab­ility that the interval will be wrong, I.e., it doesn't include the population parameter. The commonly used alpha level 0.05 corres­ponds to a 95% confidence level. If an infinite number of intervals were constr­ucted at the 0.50 alpha level (all other things being equal). 95% of them would contain the population value; 5% would not.

### Constr­ucting Confidence Intervals - Find Z Score

For an interval estimate based on +/-1.96 Z's:

The probab­ilities are that 95% of all such interval will include or overlap the population value

We can be 85% confident that the interval around our one sample outcome contains the population value

### Confidence Interval

 Point Estimate +/- Margin of Error Point Estimate +/- (Critical Value * Standard Error)
The margin of error depends on:
(1) the standard error for statistic AND
(2) a "­cri­tical value/Z score" based on the confidence level

### Constr­ucting Confidence Intervals for Propor­tions

Point Estimate +/- (Critical Value/­Score) x Standard Error)

for large samples (interval estimation for propor­tions based on small samples) (n<100) not covered)

### Example

What proportion of students at your university missed at least one day of classes because of illness last semester?

Out of a random sample of 200, 60 reported having missed classes: Ps = 60/200 = .30

### Confidence Intervals for Means

formula for large samples (n≥100)

### Example

You want to estimate the average IQ of a community using a random sample of 200 residents
- with a sample mean IQ of 105
- assuming a population standard deviation for IQ scores of 15
Alpha set at .05 (i.e. we are willing to run a 5% chance of being wrong).

What is the corres­ponding Z score ?
What is the formula?

### Conf

Three differ­ences to Formula 6.1:
- σ is replaced by s
- n is replaced by n–1 to correct for the fact that s is a biased estimator of σ

To construct confidence intervals from sample means when s is unknown, we must use a different theore­tical distri­bution, called the Student’s t distri­bution.

### T Distri­bution

 The shape of the t distri­bution varies as a function of sample size. - Distri­bution is a family of curves, each curve is defined by its degrees of freedom – a value indicating the number of scores in a sample that are “free to vary” when calcul­ating statis­tics. - Degrees of freedom (df = n–1). - As n increases, s becomes a more and more reliable estimator of the population standard deviation (σ) t distri­bution becomes more and more like the Z distri­bution.
Smaller samples: t distri­bution is flatter and has heavier tails than Z distri­bution.

The Z and t distri­bution are essent­ially identical when the sample size is greater than 100.

### T-Table Practice

 Find t score for alpha = 0.05 for n=30 Answers: Degrees of freedom (df = n-1): 30 – 1 = 29 t score: ±2.045