Show Menu

Chapter 4 Cheat Sheet Cheat Sheet by

Define descriptive epidemiology. Describe uses strengths, and limitations of selected descriptive study designs such as ecologic study, case report, case series, and cross sectional survey. Define ration, proportion, and rate. Distinguish between crude and be able to calculate age adjusted rates using either the direct or the indirect method. Define standardized morbidity (or mortality) ration.

Descri­ptive epidem­iology

Involves observ­ation, defini­tions, measur­ements, and interp­ret­ations
Dissem­ination of health­-re­lated states or events by a person, place, and time.
1. Providing inform­ation about a disease or condition.
2. Providing clues to identify a new disease or adverse health effect.
3. Identi­fying the extent of the public health problem.
4. Obtaining a descri­ption of the public health problem that can be easily commun­icated.
5. Identi­fying the population at greatest risk.
6. Assisting in planning and resource alloca­tion.
7. Identi­fying avenues for future research that can provide insights about an etiologic relati­onship between an exposure and health outcome.
The research problem, question, and hypotheses are supported by descri­ptive epidem­iology.

Hypotheses are tested using approp­riate study designs and statis­tical methods.

Describing data by person allows identi­fic­ation of the frequency of disease and who is at greatest risk.

Describing data by place (resid­ence, birthp­lace, place of employ­ment, country, state, county, census tract, etc.)

Descri­ptive statistics are a means of organizing and summar­izing data.

Descri­ptive Study Designs

Aggregate data involved (no inform­ation is available for specific indivi­duals
Takes advantages of preexi­sting data. Relatively quick and inexpe­nsive. Can be used to evaluate programs, policies, or regula­tions implem­ented at the ecologic level. Allows estimation of effects not easily measurable for indivi­duals
Suscep­tible to confou­nding, exposures and disease or injury outcomes not measured on the same indivi­duals
Case Study
A snapshot descri­ption of a problem or situation for an individual or group; qualit­ative descri­ptive research of the facts in chrono­logical order
In depth descri­ption provides cues to identify a new disease or adverse health effect resulting from an exposure or experi­ence. Identifies potential areas of research
Conclu­sions limited to the indivi­dual, group, and/or context under study, cannot be used to establish a cause-­effect relati­onship
All variables measured at a point in the time. No distin­ction between potential risk factors and outsomes
Control over study popula­tion. Control over measur­ements. Several associ­ations between variables can be studied at the same time. Short time period required. Complete data collec­tion. Exposure and injury­/di­sease data collected from same indivi­duals. Questions can be asked to obtain prevalence data.
No data in the time relati­onship between exposure and injury­/di­sease develo­pment, no follow up, potential bias from low response rate, potential measur­ements bias, higher proportion of long term survivors, not feasible with rare exposures or outcomes, does not yield incidence or relative risk
Descri­ptive study designs include case reports and case series, cross-­sec­tional surveys, and explor­atory ecologic designs.
These designs provide a means for obtaining descri­ptive statistics without typically attempting to test particular hypoth­eses.
In an ecologic study, the unit of analysis is the popula­tion. In a case report, case series, or cross-­sec­tional survey, the unit of analysis is the indivi­dual.

Ratios, Propor­tions, and Rates I

In a ratio, the values of x and y are distinct, such that the values of x are not contained in y. The rate base for a ratio is 100 = 1. The rate base for a ratio is 100 = 1. For example, in 2017 in the United States, the leading causes of death for the age group 15–24 were uninte­ntional injury (9,746 in males and 3,695 in females), suicide (5,027 in males and 1,225 in females), and homicide (4,234 in males and 671 in females). Corres­ponding ratios indicate that males are 2.64 times more likely die from uninte­ntional injury, 4.10 times more likely to commit suicide, and 6.31 times more likely to die from homicide.
A proportion is typically expressed as a percen­tage, such that the rate base is 102 = 100. Thus, for the preceding data, we can say that of deaths involving uninte­ntional injury, 72.5% were male; of suicides, 80.4% were male; and of deaths due to homicide, 86.3% were male.
Is a type of frequency measure where the numerator involves nominal data that represent the presence or absence of a health­-re­lated state or event. It also incorp­orates the added dimension of time; it may be thought of as a proportion with the addition that it represents the number of disease states, events, behaviors, or conditions in a population over a specified time period. An incidence rate is the number of new cases of a specified health­-re­lated state or event reported during a given time interval divided by the estimated population at risk of becoming a case.
In epidem­iology, it is common to deal with data that indicate whether an individual was exposed to an illness, has an illness, has experi­enced an injury, is disabled, or is dead. Ratios, propor­tions, and rates are commonly used measures for describing dichot­omous data. The general formula for a ratio, propor­tion, or rate is: X/Y *10^Z.

A mortality rate is the total number of deaths reported during a given time interval divided by the population from which the deaths occurred.

Ratios, Propor­tions, and Rates II

A mortality rate is the total number of deaths reported during a given time interval divided by the population from which the deaths occurred.
The attack rate is also called the cumulative incidence rate. It tends to describe diseases or events that affect a larger proportion of the population than the conven­tional incidence rate
Outbreak refers to more localized situat­ions, whereas epidemic refers to more widespread disease and, possibly, over a longer period. The invest­igation of the outbreak involved first constr­ucting a line listing of those at the picnic. Each line repres­ented an indivi­dual, with measur­ements taken on age, gender, time the meal was eaten, whether illness resulted, date of onset, time of onset, and whether selected foods were eaten.
Another common measure for describing disease and health­-re­lated events is preval­ence, which is the frequency of existing cases of a health­-re­lated state or event in a given population at a certain time or period.
Period prevalence is the frequency of an existing health­-re­lated state or event during a time period. For example, the period prevalence of arthritis in a given year includes existing cases the first day of the year, along with new (incident) cases diagnosed during the year. Period prevalence is less commonly used than point preval­ence.
The crude rate of an outcome is calculated without any restri­ctions, such as by age or gender or weighted adjustment of group-­spe­cific rates; however, these rates are limited if the epidem­iol­ogist is trying to compare them between subgroups of the population or over time because of potential confou­nding influe­nces, such as differ­ences in the age distri­bution between groups.
A confidence interval is the range of values in which the population rate is likely to fall.
■ SMR = 1: The health­-re­lated states or events observed were the same as expected from the age-sp­ecific rates in the standard popula­tion.

■ SMR > 1: More health­-re­lated states or events were observed than expected from the age-sp­ecific rates in the standard popula­tion.

■ SMR < 1: Fewer health­-re­lated states or events were observed than expected from the age-sp­ecific rates in the standard popula­tion.

Tables, Graphs, and Numerical Measures

A frequency distri­bution is a complete summary of the freque­ncies, or number of times each value appears.
Relative frequency is derived by dividing the number of people in each group by the total number of people.
Bar charts are often used for graphi­cally displaying a frequency distri­bution that involves nominal or ordinal data.
A histogram shows a frequency distri­bution for discrete or continuous data. The horizontal axis displays the true limits of the selected intervals.
A frequency polygon is a graphical display of a frequency table.
An epidemic curve is a histogram that shows the course of an epidemic by plotting the number of cases by time of onset.
A stem-a­nd-leaf plot is a display that organizes data to show their distri­bution. Each data value is split into a “stem” and a “leaf.”
A box plot has a single axis and presents a summary of the data.
A two-way (or bivariate) scatter plot is used to depict the relati­onship between two distinct discrete or continuous variables.
A spot map is used to display the location of each health­-re­lated state or event that occurs in a defined place and time.
A line graph is similar to a two-way scatter plot in that it depicts the relati­onship between two continuous variables.
Measures of central tendency refer to ways of design­ating the center of the data. The most common measures are the arithmetic mean, geometric mean, median, and mode.
Arithmetic mean is the measure of central location that one is most likely familiar with because it has many desirable statis­tical proper­ties; it is the arithmetic average of a distri­bution of data.
The geometric mean is calculated as the nth root of the product of n observ­ations. It is used when the logarithms of the observ­ations are normally distri­buted.
Median is the number or value that divides a list of numbers in half; it is the middle observ­ation in the data set. It is less sensitive to outliers than the mean.
Range is the difference between the largest (maximum) and smallest (minimum) values of a frequency distri­bution.
Interq­uartile range is the difference between the third quartile (75th percen­tile) and the first quartile (25th percen­tile). Note that the distri­bution of data consists of four quarters.
Variance is the average of the squared differ­ences of the observ­ations from the mean.
Standard deviation is the square root of the variance. The standard deviation has mathem­atical properties that are useful in constr­ucting the confidence interval for the mean and in statis­tical tests for evaluating research hypoth­eses.
The coeffi­cient of variation is a measure of relative spread in the data. It is a normalized measure of dispersion of a probab­ility distri­bution that adjusts the scales of variables so that meaningful compar­isons can be made.

Measures of Associ­ation

Correl­ation coeffi­cient
Represents the proportion of the total variation in the dependent variable that is determined by the indepe­ndent variable. If a perfect positive or negative associ­ation exists, then all the variation in the dependent variable would be explained by the indepe­ndent variable. Generally, however, only part of the variation in the dependent variable can be explained by a single indepe­ndent variable.
Spearman’s rank correl­ation coeffi­cient (denoted by rs)
An altern­ative to the Pearson correl­ation coeffi­cient when outlying data exist, such that one or both of the distri­butions are skewed. This method is robust to outliers.
Simple regression model y = b0 + b1x1
A statis­tical analysis that provides an equation that estimates the change in the dependent variable (y) per unit change in an indepe­ndent variable (x). This method assumes that for each value of x, y is normally distri­buted; that the standard deviation of the outcomes y do not change over x; that the outcomes y are indepe­ndent; and that a linear relati­onship exists between x and y.
Multiple regression y = b0 + b1x1 + … + bkxk
An extension of simple regression analysis in which there are two or more indepe­ndent variables. The effects of multiple indepe­ndent variables on the dependent variable can be simult­ane­ously assessed. This type of model is useful for adjusting for potential confou­nders.
Logistic regression Log(odds) = b0 + b1x1
A type of regression in which the dependent variable is a dichot­omous variable. Logistic regression is commonly used in epidem­iology because many of the outcome measures considered involve nominal data.
Multiple logistic regression Log(odds) = b0 + b1x1 + … + bkxk
An extension of logistic regression in which two or more indepe­ndent variables are included in the model. It allows the researcher to look at the simult­aneous effect of multiple indepe­ndent variables on the dependent variable. As in the case of multiple regres­sion, this method is effective in contro­lling for confou­nding factors.
A contin­gency table is where all entries are classified by each of the variables in the table. For example, suppose we were interested in assessing whether exposure to a dietary interv­ention (yes vs. no) is associated with a decrease in low-de­nsity lipopr­otein (yes vs. no). A 2 × 2 contin­gency table could represent the data.


Descri­ptive epidem­iology is used to assess and monitor the health of commun­ities and to identify health problems and priorities according to person (who?), place (where?), and time (when?) factors. It also involves charac­ter­izing the nature of the health problem (what?). Selected descri­ptive study designs, statis­tical measures, and graphs and charts were presented for describing the frequency and pattern of health­-re­lated states or events.
Descri­ptive analysis is the first step in epidem­iology to unders­tanding the presence, extent, and nature of a public health problem and is useful for formul­ating research hypoth­eses. Descri­ptive studies are hypothesis genera­ting; they provide the rationale for testing specific hypoth­eses. The analytic study design, which is the focus of a later chapter, involves evaluating direct­ional hypotheses about associ­ations between variables. Some of the same measures and statis­tical tests used in explor­atory and descri­ptive studies are also used in analytic studies. After a hypothesis is statis­tically evaluated for signif­icance and an associ­ation between variables is deemed to not be explained by chance, bias, or confou­nding, then an invest­igator can use this inform­ation as part of the evidence for establ­ishing a cause–­effect relati­onship. Other criteria to consider in making a judgment about causality must also be consid­ered, including tempor­ality, dose–r­esponse relati­onship, biologic credib­ility, and consis­tency among studies.


No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

          JavaScript Array API Cheat Sheet
          Russian Noun & Adjective Cases by Chuff Cheat Sheet