Show Menu
Cheatography

CS412: Final Exam Cheat Sheet (DRAFT) by

blah

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Types of Data

Record: data matrix (cross­tabs), document data(t­erm­-fr­equency vector­/text documents)
Graph/­Network: WWW, facebook, molecular structures
Ordered: Video data (sequence of images), temporal data - time-s­eries, genetic sequence data
Spatia­l/I­mag­e/M­ult­imedia: Maps, Photos, Videos

Median interval

Median is difficult to calculate for large amounts of data, so approx­ima­ted­/in­ter­polated for grouped data to median interval. L1 is lower boundary of mdn interval, N is # of vals of entire dataset, freq is the sum of freq of all lower than mdn interval, freq_m­edian is freq of mdn interval, and width is the width of mdn interval.
 

Attribute Type: Just important info

Binary attribute type?
Under nominal attribute type: categories subtype and also discrete
Symmetric binary vs assymetric binary
Outcomes equally important vs not eqlly important
Numeric: interv­al-­scaled vs ratio-­scaled
No true 0 pt, temper­ature, not in kelvin True 0 pt, ratios : temper­ature kelvin, length, count

Measures of central tendency: Mode/M­idrange

Unimodal, multim­odal, bimodal, trimodal, no mode
Datasets with one mode vs more than one mode vs two modes vs 3 modes vs each val only once
unimodal data formula
assyme­trical, formula: mean - mode = 3*(mea­n-m­edian)
symmetric vs positively vs negatively skewed data
mean=m­edi­an=mode @ same center vs mode<m­edi­an<mean (right­-sk­ewed) vs mean<m­edi­an<mode
midrange
highes­t+l­owe­st_val divided by 2

Measures of central tendency: Mean

1st one is sample mean, 2nd is population mean, 3rd is weighted mean.
Most useful measure of center Bad for skewed­/ou­tliers
Solution: trimmed mean: mean after trimming outliers. Loss of valuable info if too much trimmed down.