Show Menu

introduction_to_data_science_ii Cheat Sheet (DRAFT) by

final exam cheat sheet

This is a draft cheat sheet. It is a work in progress and is not finished yet.


standard units: z = (original - mean) / standard deviation
β1 = cov(x,y) / var(x)
cov(x,y) = (∑ (y_i - ȳ)(x_i - x̄)) / n
corr(xy) = cov(x,y) / (s_x * s_y)

hierar­chical clustering

find successive clusters using previously establ­ished clusters
- common approach is bottom-up: start with each element in a separate cluster
single linkage: use minimum distance
complete linkage: use maximum distance
average linkage: use average linkage
"­minimum distance between group 1 points and group 2 points is larger than the minimum within­-group distance for the same points­"

knn classi­fic­ation



k-means algorithm: 1. construct clusters by associ­ating each point with the closest centroid, 2. calculate new centroids for each set; repeat both till conver­gence
as k increases, average variance of clusters decreases
use features (x_n) to partition data into K clusters (repre­sented by its centroid -- center of points in the cluster)
goal is to minimize intra-­class centroid points distances and find cnk (0 - 1 cluster member­ship) and µk (centr­oids) that minimize

artificial neural networks

activation functions:
- sigmoid (0, 1)
sigmoid(z) = (exp z) / (1+exp z)
- hyperbolic tangent (-1, 1)
tanh(z) = (exp (z) – exp(-z) ) / (exp(z) + exp(-z))