Show Menu
Cheatography

Data Science Essentials in Python Cheat Sheet (DRAFT) by

Book Summary

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Data Analysis Sequence

Question to Answer
Data Acquis­ition
What ? Where ? How ? Format ?
Data Cleaning
Missing Values? Outliers? String Format? Normal­ization
Descri­ptive Analysis
Reporting Aggregate Measures - Scatter plots, Histog­rams, Statis­tical Summaries
Explor­atory Analysis
Relati­onships between variables - Infere­ntial, Predic­tive, Causal, Mechan­istic data analysis
Data Modeling
Training and Testing
Quality Assessment of Models
Overfi­tting ? Underf­itting ?
Interp­ret­ation of Results
Domain Expertise
 

Data Acquis­ition Pipeline

SOURCES
FORMAT
REPRES­ENT­ATION
Internet
Plain Text (unstr)
List, Tuple, Set
File / Pickled File
CSV
Array, Matrix
Database
HTML/XML
Frame, Series
 
JSON
Dictionary
 
Tabular