Cheatography

# Principal Component Analysis Cheat Sheet (DRAFT) by dganesh

A machine learning algorithm

This is a draft cheat sheet. It is a work in progress and is not finished yet.

### Motivation

 Handle High Multic­oll­ine­arity Existing Solutions: 1. Variable Selection (stepw­ise­/fo­rwa­rd/­bac­kward) Cons: Each time dropping a variable, some inform­ation is lost Visual­ization of more features Existing Solutions: 1. Pairwise scatter plots pC2 = (p*(p-­1)/2), where p is number of variables Cons: if p=20, this would mean 190 plots!
There must be a better way of doing this. Goal is to find an algorithm to reduce the number of variables without losing inform­ation. i.e. PCA

### Usecases

 1. Dimens­ion­ality Reduction without losing inform­ation. 2. Easy Data Visual­ization and Explor­atory Data Analysis 3. Create uncorr­elated featur­es/­var­iables that can be an input to a prediction model 4. Uncovering latent variab­les­/th­eme­s/c­oncepts 5. Noise reduction in dataset

### Prereq­uisite Knowledge

 Building Blocks: 1. The basis of a space: Set of linearly indepe­ndent vector­s/d­ire­ctions that span the entire space i.e. Any point in space can be repres­ented as a combin­ation of these vectors. Ex: Each row of a dataset is a point in the space. Each column is a basis vector (repre­sen­tation of any point in terms of columns). 2. Basis transf­orm­ation: The process of converting your inform­ation from one set of basis to another. (OR) Repres­enting your data in new columns different from original. Often for conven­ience, efficiency or just for common sense. Ex: Dropping or Adding a column to the dataset. 3. Variance as inform­ation: Variance = Inform­ation If two variables are highly correl­ated, they together don't add a lot of inform­ation than they do indivi­dually. So one of them can be dropped.
In 2D geometry, X and Y axes are dimens­ions. i (1,0) is a unit vector in X direction, j (0,1) is a unit vector in the Y direction. For point a: ax, ay are the units to move in 'i' and 'j' directions to reach 'a' and also denoted as: ax i + ay j. Any point in 2D space can be repres­ented in term of 'i' and 'j'. The 'i' and 'j' vectors are the 'basis of the space'. 'i' and 'j' are indepe­ndent i.e. 'i' can't be expressed in terms of 'j' and vice versa

### What does it do?

 PCA is one of a family of techniques for taking high-d­ime­nsional data, and using the depend­encies between the variables to represent it in a more tractable, lower-­dim­ens­ional basis, without losing too much inform­ation. Given p featur­es/­var­iables in a dataset, PCA finds the principal components as 1. a linear combin­ation of the original features. 2. the principal components capture maximum variance in the dataset.

### Mathem­atical Repres­ent­ation

The above equation represents the First Principal Component. PCA finds φ values such that the variance on Z₁ is maximum. The Second Principal Component is found as one that has maximal variance of all linear combin­ations that are uncorr­elated with Z₁. And like this, each additional component is capturing increm­ental variance. The algorithm calculates 'k' principal compon­ents, where k<=p (p is number of variables in dataset).

### Workings of PCA

 1. Find Principal Components a) Using SVD (Singular Value Decomp­osi­tion) 2. Choose optimal number of principal components (k).

### Singular Value Decomp­osition (SVD)

'Decom­pos­ition' because it breaks the original data matrix into 3 new matrices

### Code

 ``````from sklearn.decomposition import PCA import numpy as np X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) pca = PCA(n_components=2) pca.fit(X)``````