Show Menu
Cheatography

Scikit Learn Cheat Sheet (DRAFT) by

ddafaadf fasfasdf fasfasfasdsfgs

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Data Prepro­cessing

from sklearn.preprocessing import <classname>

StandardScaler, MinMaxScaler, RobustScaler
QuantileTransformer , PowerTransformer, FunctionTransformer
KBinsDiscretizer , PolynomialFeatures , Normalizer
scaler = StandardScaler() 


# Apply a user-defined function to the data
transformer = FunctionTransformer(np.log1p)

#  Discretize features into k bins
discretizer = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')

poly_features = PolynomialFeatures(degree=2)

X_scaled = <object>.fit_transform(X)

Encoding Catego­rical Data

from sklearn.preprocessing import <classname>

LabelEncoder , OneHotEncoder , OrdinalEncoder , LabelBinarizer

tb = OneHotEncoder()
le = LabelEncoder()
lb = LabelBinarizer()

y = le.fit_transform(['Yes', 'No', 'Yes'])
y = lb.fit_transform(['Yes', 'No', 'Yes'])

X_encoded = tb.fit_transform(X)

Handling missing values

from sklearn.impute import  SimpleImputer, KNNImputer , IterativeImputer , MissingIndicator
from sklearn.experimental import enable_iterative_imputer

imputer = SimpleImputer(strategy='mean')
imputer = KNNImputer(n_neighbors=2)
imputer = IterativeImputer(random_state=0)
indicator = MissingIndicator()

X_imputed = imputer.fit_transform(X)

Feature Selection:

from sklearn.feature_selection import 

SelectKBest ,SelectPercentile, SelectFromModel, VarianceThreshold, RFE,  RFECV,  
SequentialFeatureSelector

Dimens­ion­ality Reduction

from sklearn.decomposition import

PCA, IncrementalPCA, TruncatedSVD, KernelPCA, NMF, FastICA, LatentDirichletAllocation

pca = PCA(n_components=2)
kpca = KernelPCA(n_components=2, kernel='rbf')
tsne = TSNE(n_components=2)

X_new = any.fit_transform(X)

Pipelines:

from sklearn.pipeline import

Pipeline
FeatureUnion
ColumnTransformer

Supervised Learning Models:

Linear Models:
Linear­Reg­res­sion, Ridge, Lasso, Elasti­cNet, Logist­icR­egr­ession, SGDCla­ssi­fier, SGDReg­ressor, Perceptron
Naive Bayes:
Gaussi­anNB, Bernou­lliNB, Multin­omi­alNB,
Tree-Based Models:
Decisi­onT­ree­Cla­ssi­fier, Decisi­onT­ree­Reg­resso,
Support Vector Machines (SVM):
SVC, SVR, LinearSVC, LinearSVR, NuSVC, NuSVR, OneCla­ssSVM
Nearest Neighbors:
KNeigh­bor­sCl­ass­ifier, KNeigh­bor­sRe­gre­ssor, Radius­Nei­ghb­ors­Cla­ssi­fier, Radius­Nei­ghb­ors­Reg­ressor
Neural Networks:
MLPCla­ssi­fier, MLPReg­ressor
Ensemble
Random­For­est­Cla­ssi­fier, Random­For­est­Reg­ressor, Gradie­ntB­oos­tin­gCl­ass­ifier, Gradie­ntB­oos­tin­gRe­gre­ssor, ExtraT­ree­sCl­ass­ifier, ExtraT­ree­sRe­gre­ssor, AdaBoo­stC­las­sifier, AdaBoo­stR­egr­essor
xgboost
XGBCla­ssi­fier, XGBReg­ressor
lightgbm
LGBMCl­ass­ifier, LGBMRe­gressor
catboost
CatBoo­stC­las­sifier, CatBoo­stR­egr­essor,
from sklear­n.l­ine­ar_­model import ,from sklear­n.n­aiv­e_bayes , from sklear­n.tree
from sklear­n.e­nsemble , from xgboost import , from lightgbm import
from catboost import , from sklear­n.svm
from sklear­n.n­eig­hbors
from sklear­n.n­eur­al_­network import

Semi-S­upe­rvised Learning:

LabelPropagation
LabelSpreading

Unsupe­rvised Learning Models

Cluste­ring:
KMeans, Agglom­era­tiv­eCl­ust­ering ,DBSCAN, Birch, Spectr­alC­lus­tering
Dimens­ion­ality Reduction:
PCA, Increm­ent­alPCA, Trunca­tedSVD, KernelPCA, NMF, FastICA, Latent­Dir­ich­let­All­ocation

Clustering


KMeans
AgglomerativeClustering
DBSCAN
Birch
SpectralClustering

Model Evaluation Metrics

Regression Metrics:
mean_s­qua­red­_error, r2_score, mean_a­bso­lut­e_e­rror, explai­ned­_va­ria­nce­_score, median­_ab­sol­ute­_error, mean_s­qua­red­_lo­g_error
Classi­fic­ation Metrics:
accura­cy_­score, precis­ion­_score, recall­_score, f1_score, roc_au­c_s­core, averag­e_p­rec­isi­on_­score, log_loss, confus­ion­_ma­trix, classi­fic­ati­on_­report
from sklear­n.m­etrics import