Show Menu
Cheatography

Data Analytics Exam 1 Cheat Sheet (DRAFT) by

Data Analytics for Accountants

This is a draft cheat sheet. It is a work in progress and is not finished yet.

IMPACT Model

I
ID the questions
M
Master the data
P
Perform test plan
A
Address and refine results
C
Commun­icate insights
T
Track outcomes
Identify
understand the problem that needs addressing
 
Attributes = Audience, Scope, Use
Master
what data is available & will that data help address the problem
 
Need to know about the data:
 
how to access
 
availa­bility
 
reliab­ility
 
frequency of updates
 
time line of data coverage
Perform Test Plan
think of the right approach to the data to be able to answer the question
 
ID relati­onship between response/ dependent and predictor/ explan­atory/ indepe­ndent variables
 
8 approaches
 
Classi­fic­ation "­sorting into predefined catego­rie­s"
 
Regression "a number­"
 
Similarity Matching "­similar indivi­dua­ls"
 
Clustering "­finding natural groups­"
 
Co-occ­urrence grouping "­ass­oci­ations based on transa­cti­ons­"
 
Profiling "the typica­l"
 
Link Prediction "­rel­ati­onship between two data"
 
Data Reduction "­Reduces to most critic­al"
Address and refine
Data analysis is iterative
 
slice, dice, and manipulate the data
Commun­icate
insights are formed by decision makers and are commun­icated
 
executive summaries, static reports, digital dashbo­ards, and data visual­iza­tions
Track outcomes
can we predict future outcomes
 
then you can test how accurate the predic­tions were

Business Enviro­nment and Big Data

Big Data
4 V's
Volume (size)
 
Velocity (speed)
 
Variety (types)
 
Veracity (quality)
Impact of data
Auditing
Audits must embrace techno­logy.
 
Technology = better quality, transp­arency, accuracy in audit.
 
gathering data rationale behind data queries
 
expands auditors’ capabi­lities in fraud detection
 
automating compli­anc­e-m­oni­toring activities
Management Accounting
(most similar to analytics)
 
Job of MA:
 
are asked questions by management
 
find data to address those questions
 
analyze the data
 
report the results to management
Financial Reporting
Financial Statement Analysis
Tax

Relational Database

Benefits of (3NF) Relational DB
 
Comple­teness
 
No redundancy
 
Business rules enforc­ement (IC)
 
Commun­ication and integr­ation of business processes
ETL process
extract, transform, load
doing what to the data?
Extract
 
1. Determ­ining the purpose and scope
 
2. Obtaining
Transform
 
3. Validating for comple­teness and integrity
 
4. Cleaning
Load
5. Loading the data
DO a VICy Clean Load
Step 1 Determ­ining
the purpose, to solve, what problem
 
reliab­ility, usefulness
 
nature, timing, and extent
Step 2 Obtaining
How obtain?
 
standard data request form??
 
Where data?
 
What specific data?
 
what tools needed
ID what you need to Extract
Where is your info
 
tables
 
attributes
 
relations between the 2
Step 3 Validating
ensure extracted data = complete & integrity
4 steps after extraction
1. Compare number of records in OG and extracted
 
2. Compare descri­ptive statistics
 
3. Validate Date/Time fields
 
4. Compare string limits
Step 4
Cleaning
4 steps to clean
Remove headings or subtotals
 
Clean leading zeroes and (NPC's) nonpri­ntable characters
 
Format negative numbers
 
Correct incons­ist­encies (US, U.S., United States, States)
Common quality issues
Dates
 
Numbers
 
Intern­ational characters and encoding
 
Languages and measures
 
Human error
Step 5
Loading
 
if previous steps were done correctly then no loading necessary
Extract = Determ­ining & Obtaining / 1 & 2
Transform = Validating & Cleaning / 3 & 4
Load = Load / 5

4 quads

 
Declar­ative
Explor­atory
Qual
Quad 1
Quad 3
Quan
Quad 2
Quad 4

Distri­butions

Normal Distri­bution
ANY mean and ANY standard deviation
 
The mean, median, and mode are all equal
 
Half the data falls below the mean, half above
 
SAT scores, IQ scores, heights and weights of newborn babies
Standard Normal Distri­bution
special
not typical for data-d­riven quanti­tative data
A mean of exactly 0
 
A standard deviation of exactly 1
 
the mean is 0, the median and mode are also 0
Poisson Distri­bution
the probab­ility of a specific number of events happening in a fixed time period
 

Data Governance and Ethics

Institute of Business Ethics
6 Questions
Does the compan­y....
How does the company use data?
 
...send a privacy notices?
 
...assess the risks for the customer?
 
...have safeguards for the risks of data misuse?
 
... have the approp­riate tools to manage the risks of data misuse?
 
...conduct approp­riate due diligence when sharing with or acquiring data from third parties?
LMAO this is America, we have no data protection
This country is ruled by the amount of profit that share holders can extract
No data rights for our own data
#bigbr­oth­er4­profit

7 goals CH1

Developed Analytics Mindset
Recognize when and how data analytics can address business questions.
Data Scrubbing and Data Prepar­ation
Comprehend the process needed to clean and prepare the data before analysis.
Data Quality
Recognize what is meant by data quality, be it comple­teness, reliab­ility, or validity.
Descri­ptive Data Analysis
Perform basic analysis to understand the quality of the underlying data and its ability to address the business question.
Data Analysis through Data Manipu­lation
Demons­trate the ability to sort, rearrange, merge, and reconf­igure data in a manner that allows enhanced analysis.
Statis­tical Data Analysis Competency
Identify and implement an approach that will use statis­tical data analysis to draw conclu­sions and make recomm­end­ations on a timely basis.
Data Visual­ization and Data Reporting
Report results of analysis in an accessible way to each varied decision maker and their specific needs.

Main Types of Data Analytics

Descri­ptive (past)
summarize existing data
 
what has happened
Diagnostic (current)
explore the data
 
why something has happened the way it has
Predictive (future)
used to generate a model
 
what is likely to happen
Prescr­iptive (current/ for future)
identify the best possible options given constr­aints
 
more advanced AI
 
optimize current processes
Descri­ptive
Summary statistics
mean, median, standard deviation, ect
Data reduction or filtering
IFIF (Identify attribute, Filter, Interpret, Follow up)
Diagnostic
Benefits of Diagnostic
reduced external audit fees, reduced audit delay, lower material weakne­sses, restat­ements
Profiling
Charac­terizes the typical behavior
 
IDSIF (ID, Determine, Set bounda­ries, Interpret, Follow-up
How profile?
Z-score
 
Box Whisker
 
interq­uartile range (IQR)
Clustering (finding natural groups)
Divides indivi­duals into groups that share common underlying charac­ter­istics
Hypothesis Testing
proof if P&C are meaningful
Similarity matching
Identifies similar indivi­duals based on data already known about them
Cooccu­rrence grouping
Discovers associ­ations between indivi­duals based on transa­ctions they are both involved in
Predictive
 
target = attribute or value to evaluate
 
class = assigned category (to record for event)
Regression (number)
Estimates or predicts a numerical value for a variable using a statis­tical model
IV/DFF­/IP/EF
Identify the variables, Determine the functional form, Identify parame­ters, Evaluate fit
Classi­fic­ation (sorting into predefined catego­ries)
Assigns each unit into a small set of categories or classes
Link Prediction (relat­ion­ship)
Predicts a relati­onship between two data items
Prescr­iptive
Decision support systems
Rule-based systems that gather data and recommend actions
Artificial intell­igence
Learning models that adapt to new data over time to make recomm­end­ations
Benford’s law
The law states that in many naturally occurring collec­tions of numbers, the signif­icant leading digit is likely to be small.
overfi­tting
complex models
underf­itting
simple models
signif­icance level = alpha
t-test = p-value < alpha = statis­tically signif­icant else

Graphs Charts and Tables OH MY!

Why Pictures?
statistics
alone can be misleading
visual­iza­tions
visual­iza­tions
Purpose
Declar­ative
presenting findings
Explor­atory
discov­ering insights
Qualit­ative
catego­rical
Quanti­tative
numerical
Data Types
Qualit­ative Data (categ­ori­cal):
Nominal
only count and group
Ordinal
AND rank
Quanti­tative Data (numer­ical):
Interval (no 0)
and measure differ­ences
Ratio
has a meaningful zero
The right chart
QUAL
Bar/Column C.
compares propor­tions of categories
Pie chart C.
parts of a whole
Stacked bar C.
shows proportion AND allows comparison
Word cloud
used for text data
QUAN
Line C. (conti­nuous )
trends over time
Box and whisker P.
quartiles, medians, and outliers
Scatter P.
correl­ation between two variables or a trend line
Filled geographic map
data ranges across geography
Refining
readab­ility
How much data do you need to show
 
Should outliers be displayed or removed?
make differ­ences look dramatic
What scale should be used?
 
Do you need reference points to make the scale meanin­gful?
distract from data
When should you use multiple colors?
Reports
get to the point
I
Explain what was being researched
M
Overview of the data source and what data was included
P
Describe the analytical approach used
A
Present the results of the analysis
C
Commun­icate the insights and what they mean
T
Describe what outcomes will be tracked going forward

Data Types

Discrete Data
whole numbers
 
number of students in a class
Interval Data
the differ­ences between values are meaningful and equal
 
Fahrenheit temper­ature
 
time on a clock
Ratio data
requires a true zero
 
0 pounds = no weight
 
0 dollars = no money no problems