Show Menu

General (Stats) Cheat Sheet (DRAFT) by

General Informations for the Steps 1-5

This is a draft cheat sheet. It is a work in progress and is not finished yet.


Reliab­ility is about the consis­tency of a measure
The consis­tency of a measure across time: do you get the same results when you repeat the measur­ement?
The consis­tency of a measure across raters or observers: do you get the same results when different people conduct the same measur­ement?
Internal consis­tency
The consis­tency of the measur­ement itself: do you get the same results from different parts of a test that are designed to measure the same thing?
Ensuring reliab­ility
Apply your methods consis­tently, Standa­rdize the conditions of your research
validity is about the accuracy of a measure
The adherence of a measure to existing theory and knowledge of the concept being measured.
The extent to which the measur­ement covers all aspects of the concept being measured.
The extent to which the result of a measure corres­ponds to other valid measures of the same concept.
Ensuring validity
Choose approp­riate methods of measur­ement, Use approp­riate sampling methods to select your subjects

Quanti­tative Data

is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predic­tions, test causal relati­ons­hips, and generalize results to wider popula­tions.
Research Methods:
descri­ptive research
you simply seek an overall summary of your study variables.
correl­ational research
you invest­igate relati­onships between your study variables
experi­mental research
you system­ati­cally examine whether there is a cause-­and­-effect relati­onship between variables.
Replic­ation, Direct comparison of results, Large Samples, Hypothesis testing
Superf­ici­ality, Narrow focus, Structural bias, Lack of context

Qualit­ative Data

Qualit­ative research involves collecting and analyzing non-nu­merical data (e.g., text, video, or audio) to understand concepts, opinions, or experi­ences. It can be used to gather in-depth insights into a problem or generate new ideas for research.
Research Methods:
recording what you have seen, heard, or encoun­tered in detailed field notes.
personally asking people questions in one-on-one conver­sat­ions.
Focus groups:
asking questions and generating discussion among a group of people.
distri­buting questi­onn­aires with open-ended questions.
Secondary research:
collecting existing data in the form of texts, images, audio or video record­ings, etc.
Flexib­ility, Natural setting, Meaningful insights, Generation of new ideas
Unreli­abi­lity, Subjec­tivity, Limited genera­liz­abi­lity, Labor- intensive

Descri­ptive Statistics

summarize and organize charac­ter­istics of a data set. A data set is a collection of responses or observ­ations from a sample or entire popula­tion.
3 main types of descri­ptive statis­tics:
1. The distri­bution concerns the frequency of each value (Graphs).
2. The Measures of central tendency concerns the averages of the values
- Mean
To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observ­ations is called N.
- Median
To find the median, order each response value from the smallest to the biggest. Then, the median is the number in the middle. If there are two numbers in the middle, find their mean.
- Mode
To find the mode, order your data set from lowest to highest and find the response that occurs most frequently
3. The Measures of variab­ility or dispersion concerns how spread out the values are
- Range
To find the range, simply subtract the lowest value from the highest value. - Standard Deviation
- Standard Deviation
The standard deviation (s) is the average amount of variab­ility in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.
- Variance
The variance (s2)is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.
Univariate descri­ptive statistics
Univariate descri­ptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distri­bution, central tendency and spread.
Bivariate descri­ptive statistics
If you’ve collected data on more than one variable, you can use bivariate or multiv­ariate descri­ptive statistics to explore whether there are relati­onships between them. In bivariate analysis, you simult­ane­ously study the frequency and variab­ility of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statis­tical tests.
Multiv­ariate analysis
is the same as bivariate analysis but with more than two variables.
Contin­gency table
In a contin­gency table, each cell represents the inters­ection of two variables. Usually, an indepe­ndent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activi­ties). You read “across” the table to see how the indepe­ndent and dependent variables relate to each other.
Scatter plots
A scatter plot is a chart that shows you the relati­onship between two or three variables. It’s a visual repres­ent­ation of the strength of a relati­onship.
In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is repres­ented by a point in the chart.

Infere­ntial Statistics

help you come to conclu­sions and make predic­tions based on your data, to understand the larger population from which the sample is taken. it’s important to use random and unbiased sampling methods. If your sample isn’t repres­ent­ative of your popula­tion, then you can’t make valid statis­tical infere­nces.
Infere­ntial statistics have two main uses:
• making estimates about popula­tions (for example, the mean SAT score of all 11th graders in the US).
• testing hypotheses to draw conclu­sions about popula­tions (for example, the relati­onship between SAT scores and family income).
Sampling error
Since the size of a sample is always smaller than the size of the popula­tion, some of the population isn’t captured by sample data. This creates sampling error, which is the difference between the true population values (called parame­ters) and the measured sample values (called statis­tics).
two important types of estimates you can make about the population
point estimate
is a single value estimate of a parameter. For instance, a sample mean is a point estimate of a population mean.
interval estimate
gives you a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.
- confidence interval
uses the variab­ility around a statistic to come up with an interval estimate for a parameter. Confidence intervals are useful for estimating parameters because they take sampling error into account. confidence interval tells you the uncert­ainty of the point estimate confidence level tells you the probab­ility (in percen­tage) of the interval containing the parameter estimate if you repeat the study again A 95% confidence interval means that if you repeat your study with a new sample in exactly the same way 100 times, you can expect your estimate to lie within the specified range of values 95 times.
Hypothesis Testing
is a formal process of statis­tical analysis using infere­ntial statis­tics. The goal of hypothesis testing is to compare popula­tions or assess relati­onships between variables using samples.
Parametric tests make assump­tions that include the following:
• the population that the sample comes from follows a normal distri­bution of scores
• the sample size is large enough to represent the population
• the variances, a measure of spread, of each group being compared are similar
Non-pa­ram­etric tests are called “distr­ibu­tio­n-free tests” because they don’t assume anything about the distri­bution of the population data.
Comparison tests
assess whether there are differ­ences in means, medians or rankings of scores of two or more groups
T-test, Anova, Mood´s median, Wolcoxon signed- rank, Mann-W­hitnes U, Krusta­l-W­allis H
Correl­ation tests
Correl­ation tests determine the extent to which two variables are associ­ated.
Pearson´s r, Spearman´s r, Chi square test of indepe­ndence
Regression tests
Regression tests demons­trate whether changes in predictor variables cause changes in an outcome variable. You can decide which regression test to use based on the number and types of variables you have as predictors and outcomes. Most of the commonly used regression tests are parame­tric. If your data is not normally distri­buted, you can perform data transf­orm­ations.
Simple linear regres­sion, Multiple linear regres­sion, Logistic regres­sion, Nominal regres­sion, Ordinal regression