General (Stats) Cheat Sheet

Terms

Reliability
Reliability is about the consistency of a measure
Test-retest	The consistency of a measure across time: do you get the same results when you repeat the measurement?
Interrater	The consistency of a measure across raters or observers: do you get the same results when different people conduct the same measurement?
Internal consistency	The consistency of the measurement itself: do you get the same results from different parts of a test that are designed to measure the same thing?
Ensuring reliability	Apply your methods consistently, Standardize the conditions of your research
Validity
validity is about the accuracy of a measure
Construct	The adherence of a measure to existing theory and knowledge of the concept being measured.
Content	The extent to which the measurement covers all aspects of the concept being measured.
Criterion	The extent to which the result of a measure corresponds to other valid measures of the same concept.
Ensuring validity	Choose appropriate methods of measurement, Use appropriate sampling methods to select your subjects

Quantitative Data

is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations.
Research Methods:
descriptive research	you simply seek an overall summary of your study variables.
correlational research	you investigate relationships between your study variables
experimental research	you systematically examine whether there is a cause-and-effect relationship between variables.
Advantages:	Replication, Direct comparison of results, Large Samples, Hypothesis testing
Disadvantages:	Superficiality, Narrow focus, Structural bias, Lack of context

Qualitative Data

Qualitative research involves collecting and analyzing non-numerical data (e.g., text, video, or audio) to understand concepts, opinions, or experiences. It can be used to gather in-depth insights into a problem or generate new ideas for research.
Research Methods:
Observations:	recording what you have seen, heard, or encountered in detailed field notes.
Interviews:	personally asking people questions in one-on-one conversations.
Focus groups:	asking questions and generating discussion among a group of people.
Surveys:	distributing questionnaires with open-ended questions.
Secondary research:	collecting existing data in the form of texts, images, audio or video recordings, etc.
Advantages:	Flexibility, Natural setting, Meaningful insights, Generation of new ideas
Disadvantages:	Unreliability, Subjectivity, Limited generalizability, Labor- intensive

Descriptive Statistics

summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population.
3 main types of descriptive statistics:
1. The distribution concerns the frequency of each value (Graphs).
2. The Measures of central tendency concerns the averages of the values
- Mean	To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N.
- Median	To find the median, order each response value from the smallest to the biggest. Then, the median is the number in the middle. If there are two numbers in the middle, find their mean.
- Mode	To find the mode, order your data set from lowest to highest and find the response that occurs most frequently
3. The Measures of variability or dispersion concerns how spread out the values are
- Range	To find the range, simply subtract the lowest value from the highest value. - Standard Deviation
- Standard Deviation	The standard deviation (s) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.
- Variance	The variance (s2)is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.
Univariate descriptive statistics
Univariate descriptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distribution, central tendency and spread.
Bivariate descriptive statistics
If you’ve collected data on more than one variable, you can use bivariate or multivariate descriptive statistics to explore whether there are relationships between them. In bivariate analysis, you simultaneously study the frequency and variability of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statistical tests.
Multivariate analysis
is the same as bivariate analysis but with more than two variables.
Contingency table
In a contingency table, each cell represents the intersection of two variables. Usually, an independent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activities). You read “across” the table to see how the independent and dependent variables relate to each other.
Scatter plots
A scatter plot is a chart that shows you the relationship between two or three variables. It’s a visual representation of the strength of a relationship.
In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is represented by a point in the chart.

Inferential Statistics

help you come to conclusions and make predictions based on your data, to understand the larger population from which the sample is taken. it’s important to use random and unbiased sampling methods. If your sample isn’t representative of your population, then you can’t make valid statistical inferences.
Inferential statistics have two main uses:
	• making estimates about populations (for example, the mean SAT score of all 11th graders in the US).
	• testing hypotheses to draw conclusions about populations (for example, the relationship between SAT scores and family income).
Sampling error
Since the size of a sample is always smaller than the size of the population, some of the population isn’t captured by sample data. This creates sampling error, which is the difference between the true population values (called parameters) and the measured sample values (called statistics).
two important types of estimates you can make about the population
point estimate	is a single value estimate of a parameter. For instance, a sample mean is a point estimate of a population mean.
interval estimate	gives you a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.
- confidence interval	uses the variability around a statistic to come up with an interval estimate for a parameter. Confidence intervals are useful for estimating parameters because they take sampling error into account. confidence interval tells you the uncertainty of the point estimate confidence level tells you the probability (in percentage) of the interval containing the parameter estimate if you repeat the study again A 95% confidence interval means that if you repeat your study with a new sample in exactly the same way 100 times, you can expect your estimate to lie within the specified range of values 95 times.
Hypothesis Testing
is a formal process of statistical analysis using inferential statistics. The goal of hypothesis testing is to compare populations or assess relationships between variables using samples.
Parametric tests make assumptions that include the following:
	• the population that the sample comes from follows a normal distribution of scores
	• the sample size is large enough to represent the population
	• the variances, a measure of spread, of each group being compared are similar
Non-parametric tests are called “distribution-free tests” because they don’t assume anything about the distribution of the population data.
Comparison tests
assess whether there are differences in means, medians or rankings of scores of two or more groups
	T-test, Anova, Mood´s median, Wolcoxon signed- rank, Mann-Whitnes U, Krustal-Wallis H
Correlation tests
Correlation tests determine the extent to which two variables are associated.
	Pearson´s r, Spearman´s r, Chi square test of independence
Regression tests
Regression tests demonstrate whether changes in predictor variables cause changes in an outcome variable. You can decide which regression test to use based on the number and types of variables you have as predictors and outcomes. Most of the commonly used regression tests are parametric. If your data is not normally distributed, you can perform data transformations.
	Simple linear regression, Multiple linear regression, Logistic regression, Nominal regression, Ordinal regression

General (Stats) Cheat Sheet (DRAFT) by Robyn.jll

Terms

Quantitative Data

Qualitative Data

Descriptive Statistics

Inferential Statistics

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

General (Stats) Cheat Sheet (DRAFT) by Robyn.jll

Terms

Quanti­tative Data

Qualit­ative Data

Descri­ptive Statistics

Infere­ntial Statistics

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

Quantitative Data

Qualitative Data

Descriptive Statistics

Inferential Statistics