Show Menu
Cheatography

Scikit-Learn Python Cheat Sheet (DRAFT) by

Scikit-Learn is one of the most effective python libraries for machine lerning and statistical modelling. This is built on Numpy, Pandas and Matplotlib.

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Machine Learning

Supervised Learning
Unsupe­rvised learning
The model maps input to an output based on the previous input-­output pairs
No training is given to the model and it has to discover the features of input by self training mechanism.
Scikit learn can be used in Classi­¬fi­c¬a­tion, Regres­¬sion, Cluste­¬ring, Dimens­¬io­n¬ality reduct­¬io­n¬,­Model Selection and prepro­¬ce­ssing by supervised and unsupe­¬rvised training models.

Basic Commands

>>> from sklearn import neighbors, datasets, prepro­cessing
>>> from sklear­n.m­ode­l_s­ele­ction import train_­tes­t_split
>>> from sklear­n.m­etrics import accura­cy_­score
>>> iris = datase­ts.l­oa­d_i­ris()
>>> X, y = iris.d­ata[:, :2], iris.t­arget
>>> X_train, X_test, y_train, y_test = train_­tes­t_s­plit(X, y, random­_st­ate=33)
>>> scaler = prepro­ces­sin­g.S­tan­dar­dSc­ale­r().fi­t(X­_train)
>>> X_train = scaler.tr­ans­for­m(X­_train)
>>> X_test = scaler.tr­ans­for­m(X­_test)
>>> knn = neighb­ors.KN­eig­hbo­rsC­las­sif­ier­(n_­nei­ghb­ors=5)
>>> knn.fi­t(X­_train, y_train)
>>> y_pred = knn.pr­edi­ct(­X_test)
>>> accura­cy_­sco­re(­y_test, y_pred)

Loading Data example

>>> import numpy as np
>>> X = np.ran­dom.ra­ndo­m((­20,2))
>>> y = np.arr­ay(­['A­','­B',­'C'­,'D­','­E',­'F'­,'G­','­A',­'C'­,'A­','B'])
>>> X[X < 0.7] = 0
The data being loaded should be numeric and has to be stored as NumPy arrays or SciPy sparse matrices.
 

Processing Loaded Data

Standa­rdi­zation
Normal­ization
Binari­zation
>>> from sklear­n.p­rep­roc­essing import Standa­rdS­caler
>>> from sklear­n.p­rep­roc­essing import Normalizer
>>> from sklear­n.p­rep­roc­essing import Binarizer
>>> scaler = Standa­rdS­cal­er(­).f­it(­X_t­rain)
>>> scaler = Normal­ize­r().fi­t(X­_train)
>>> binarizer = Binari­zer­(th­res­hol­d=0.0).fit(X)
>>> standa­rdi­zed_X = scaler.tr­ans­for­m(X­_train)
>>> normal­ized_X = scaler.tr­ans­for­m(X­_train)
>>> binary_X = binari­zer.tr­ans­form(X)
>>> standa­rdi­zed­_X_test = scaler.tr­ans­for­m(X­_test)
>>> normal­ize­d_X­_test = scaler.tr­ans­for­m(X­_test)

Training And Test Data

>>> from sklear­n.m­ode­l_s­ele­ction import train_­tes­t_split
>>> X_train, X_test, y_train, y_test = train_­tes­t_s­pli­t(X­,y,­ran­dom­_st­ate=0)

Creating Model

Supervised Learning Estimators
Linear Regression
Support Vector Machines (SVM)
Naive Bayes
>>> from sklear­n.l­ine­ar_­model import Linear­Reg­ression
>>> from sklear­n.svm import SVC
>>> from sklear­n.n­aiv­e_bayes import GaussianNB
>>> lr = Linear­Reg­res­sio­n(n­orm­ali­ze=­True)
>>> svc = SVC(ke­rne­l='­lin­ear')
>>> gnb = Gaussi­anNB()

Creating Model

Unsupe­rvised Learning Estimators
Principal Component Analysis (PCA)
K Means
>>> from sklear­n.d­eco­mpo­sition import PCA
>>> from sklear­n.c­luster import KMeans
>>> pca = PCA(n_­com­pon­ent­s=0.95)
>>> k_means = KMeans­(n_­clu­ste­rs=3, random­_st­ate=0)
 

Model Fitting

Supervised Learning
Unsupe­rvised learning
>>> lr.fit(X, y)
>>> k_mean­s.f­it(­X_t­rain)
>>> knn.fi­t(X­_train, y_train)
>>> pca_model = pca.fi­t_t­ran­sfo­rm(­X_t­rain)
>>> svc.fi­t(X­_train, y_train)

Predicting output

Supervised Estimators
Unsupe­rvised Estimators
>>> y_pred = svc.pr­edi­ct(­np.r­an­dom.ra­ndo­m((­2,5)))
>>> y_pred = k_mean­s.p­red­ict­(X_­test)
>>> y_pred = lr.pre­dic­t(X­_test)
>>> y_pred = knn.pr­edi­ct_­pro­ba(­X_t­est))

Classi­fic­ation Metrics Model Perfor­mance

Accuracy Score
Classi­fic­ation Report
Confusion Matrix
>>> knn.sc­ore­(X_­test, y_test)
>>> from sklear­n.m­etrics import classi­fic­ati­on_­report
>>> from sklear­n.m­etrics import confus­ion­_matrix
>>> from sklear­n.m­etrics import accura­cy_­score
>>> print(­cla­ssi­fic­ati­on_­rep­ort­(y_­test, y_pred)))
>>> print(­con­fus­ion­_ma­tri­x(y­_test, y_pred)))
>>> accura­cy_­sco­re(­y_test, y_pred)

Clustering Metrics Model Perfor­mance

Adjusted Rand Index
Homoge­neity
Cross-­Val­idation
>>> from sklear­n.m­etrics import adjust­ed_­ran­d_score
>>> from sklear­n.m­etrics import homoge­nei­ty_­score
>>> print(­cro­ss_­val­_sc­ore­(knn, X_train, y_train, cv=4))
>>> adjust­ed_­ran­d_s­cor­e(y­_true, y_pred))
>>> homoge­nei­ty_­sco­re(­y_true, y_pred))
>>> print(­cro­ss_­val­_sc­ore(lr, X, y, cv=2))