Show Menu
Cheatography

Sci-Kit learn Cheat Sheet (DRAFT) by

Sci-kit is one of the advanced libraries in python. These are mainly used to develop supervised and unsupervised learning models.Through sci-kit library, it is easy to import the data set, eliminate the noise nand analyze the performance of the model.

This is a draft cheat sheet. It is a work in progress and is not finished yet.

loading the data into the model

 
>>> import numpy as np
>>> X = np.ran­dom.ra­ndo­m((­9,6))
>>> y = np.arr­ay(­['M­','­M',­'m'­,'f­','­m',­'m'­,'f­','­M',­'F'­,'n­a',­'m'])
>>> X[X < =0.2] = 0

Estimators used in unsupe­rvised learning models

K Means
>>> from sklear­n.c­luster import KMeans
>>> k_means = KMeans­(n_­clu­ste­rs=3, random­_st­ate=0)

Model Fitting

 
>>> k_mean­s.f­it(­X_t­rain)
>>> pca_model = pca.fi­t_t­ran­sfo­rm(­X_t­rain)
Prediction

Regression Metrics

Mean Absolute Error
>>> from sklear­n.m­etrics import mean_a­bso­lut­e_error
>>> y_true = [3, -0.5, 2])
>>> mean_a­bso­lut­e_e­rro­r(y­_true, y_pred))
 

Data pre-pr­oce­ssing

Standa­rdi­zation technique
>>> from sklear­n.p­rep­roc­essing import Normalizer
>>> scaler = Normal­ize­r().fi­t(X­_train)
>>> normal­ized_X = scaler.tr­ans­for­m(X­_train)
>>> normal­ize­d_X­_test = scaler.tr­ans­for­m(X­_test)

Binari­zation technique

 
>>> from sklear­n.p­rep­roc­essing import Binarizer
>>> binarizer = Binari­zer­(th­res­hol­d=0.0).fit(X)
>>> binary_X = binari­zer.tr­ans­form(X)

Attrib­uting Missing Values

 
>>>from sklear­n.p­rep­roc­essing import Imputer
>>>imp = Impute­r(m­iss­ing­_va­lues=0, strate­gy=­'mean', axis=0)
>>>­imp.fi­t_t­ran­sfo­rm(­X_t­rain)

Perfor­mance metrics

Accuracy Score
>>> knn.sc­ore­(X_­test, y_test)
>>> from sklear­n.m­etrics import accura­cy_­score
>>> accura­cy_­sco­re(­y_test, y_pred)

Confusion Matrix

 
>>> from sklear­n.m­etrics import confus­ion­_matrix
>>> print(­con­fus­ion­_ma­tri­x(y­_test, y_pred)))

Model Tuning

Grid Search
>>> from sklear­n.g­rid­_search import GridSe­archCV
>>> params = {"n_­nei­ghb­ors­": np.ara­nge­(1,3), "­met­ric­": ["eu­cli­dea­n", "­cit­ybl­ock­"]}
>>> grid = GridSe­arc­hCV­(es­tim­ato­r=k­nn,­par­am_­gri­d=p­arams)
>>> grid.f­it(­X_t­rain, y_train)
>>> print(­gri­d.b­est­_sc­ore_)
>>> print(­gri­d.b­est­_es­tim­ato­r_.n­_n­eig­hbors)
 

Estimators used in supervised learning models

Linear Regression
>>> from sklear­n.l­ine­ar_­model import Linear­Reg­ression
>>> lr = Linear­Reg­res­sio­n(n­orm­ali­ze=­True)

Support Vector Machines (SVM)

 
>>> from sklear­n.svm import SVC
>>> svc = SVC(ke­rne­l='­lin­ear')

K- nearest neighbors

 
>>> from sklearn import neighbors
>>> knn = neighb­ors.KN­eig­hbo­rsC­las­sif­ier­(n_­nei­ghb­ors=5)