Show Menu

scikit-learn Cheat Sheet (DRAFT) by

Cheat sheet for scikit-learn library in python.

This is a draft cheat sheet. It is a work in progress and is not finished yet.


pip install jupyter
installs jupyter
jupyter notebook
starts jupyter notebook
Creating a notebook
go to new on the upper right and click on python
shift + enter
File menu
can create a new Notebook or open a preexi­sting one. This is also where you would go to rename a Notebook. I think the most intere­sting menu item is the Save and Checkpoint option. This allows you to create checkp­oints that you can roll back to if you need to.
Edit menu
Here you can cut, copy, and paste cells. This is also where you would go if you wanted to delete, split, or merge a cell. You can reorder cells here too.
View menu
useful for toggling the visibility of the header and toolbar. You can also toggle Line Numbers within cells on or off. This is also where you would go if you want to mess about with the cell’s toolbar.
Insert menu
just for inserting cells above or below the currently selected cell.
Cell menu
allows you to run one cell, a group of cells, or all the cells. You can also go here to change a cell’s type, although the toolbar is more intuitive for that. The other handy feature in this menu is the ability to clear a cell’s output.
Kernel cell
is for working with the kernel that is running in the backgr­ound. Here you can restart the kernel, reconnect to it, shut it down, or even change which kernel your Notebook is using.
Widgets menu
is for saving and clearing widget state. Widgets are basically JavaScript widgets that you can add to your cells to make dynamic content using Python (or another Kernel).
Help menu
which is where you go to learn about the Notebook’s keyboard shortcuts, a user interface tour, and lots of reference material.
Running tab
will tell you which Notebooks and Terminals you are currently running.
cell types: Code
cell where you write code
cell types: Raw NBConvert
is only intended for special use cases when using the nbconvert command line tool. Basically it allows you to control the formatting in a very specific way when converting from a Notebook to another format.
cell types: Heading
The Heading cell type is no longer supported and will display a dialog that says as much. Instead, you are supposed to use Markdown for your Headings.
cell types: Markdown
Jupyter Notebook supports Markdown, which is a markup language that is a superset of HTML. Next up some of the possible utilities of this type of cell will be shown. Once a markdown cell is written, its text cannot be changed.
_italic_ or italic
# Header 1
## Header 2
### Header 3
You can create a list (bullet points) by using dashes, plus signs, or asterisks. There needs to be a space between the marker and the letters. To make sub lists, press tab first
For inline code highli­ghting, just surround the code with backticks. If you want to insert a block of code, you can use triple backticks and also specify the progra­mming language:
python ...
in multiple lines
Exporting notebooks
When you are working with Jupyter Notebooks, you will find that you need to share your results with non-te­chnical people. When that happens, you can use the nbconvert tool which comes with Jupyter Notebook to convert or export your Notebook into one of the following formats: HTML, LaTex, PDF, RevealJS, Markdown, ReStru­ctuted Text, Executable script
How to Use nbconvert
Open up a terminal and navigate to the folder that contains the Notebook you wish to convert. The basic conversion command looks like this: jupyter nbconvert <input notebo­ok> --to <output format­>. Example: upyter nbconvert py_exa­mpl­es.i­pynb --to pdf
You can also export your currently running Notebook by going to the File menu and choosing the Download as option. This option allows you to download in all the formats that nbconvert supports. However I recommend doing so as you can use nbconvert to export multiple Notebooks at once, which is something that the menu does not support.
A Notebook extension (nbext­ension) is a JavaScript module that you load in most of the views in the Notebook’s frontend.
Where Do I Get Extens­ions?
You can use Google or search for Jupyter Notebook extens­ions.
How Do I Install Them?
jupyter nbexte­nsion install EXTENS­ION­_NAME
enable an extension after installing it
jupyter nbexte­nsion enable EXTENS­ION­_NAME
installing python packages
! pip install packag­e_name --user
If you see a greyed out menu item, try changing the cell’s type and see if the item becomes available to use.

Evaluation Metrics and Scoring

from sklear­n.m­etrics import confus­ion­_matrix
confusion = confus­ion­_ma­tri­x(y­_test, Logist­icR­egr­ess­ion­(C=­0.1­).f­it(­X_t­rain, y_trai­n).p­re­dic­t(X­_test))
Precision (positive predictive value)
Importing f-score
from sklear­n.m­etrics import f1_score
f1_sco­re(­y_test, pred_m­ost­_fr­equ­ent)))
Importing classi­fic­ation report
from sklear­n.m­etrics import classi­fic­ati­on_­report
classi­fic­ati­on_­rep­ort­(y_­test, model, target­_na­mes­=["not nine", "­nin­e"]))
Prediction threshold
y_pred­_lo­wer­_th­reshold =­cis­ion­_fu­nct­ion­(X_­test) > -.8
Classi­fic­ation report
classi­fic­ati­on_­rep­ort­(y_­test, y_pred­_lo­wer­_th­res­hold)
Importing precis­on_­rec­all­_curve
from sklear­n.m­etrics import precis­ion­_re­cal­l_curve
using the curve
precision, recall, thresholds = precis­ion­_re­cal­l_c­urve( y_test,­cis­ion­_fu­nct­ion­(X_­test))
find threshold closest to zero
close_zero = np.arg­min­(np.ab­s(t­hre­sho­lds))­ot(­pre­cis­ion­[cl­ose­_zero], recall­[cl­ose­_zero], 'o', marker­siz­e=10, label=­"­thr­eshold zero", fillst­yle­="no­ne", c='k', mew=2)
for random forest
precis­ion_rf, recall_rf, thresh­olds_rf = precis­ion­_re­cal­l_c­urve( y_test, rf.pre­dic­t_p­rob­a(X­_te­st)[:, 1])­ot(­pre­cis­ion­_rf­[cl­ose­_de­fau­lt_rf], recall­_rf­[cl­ose­_de­fau­lt_rf], '^', c='k', marker­siz­e=10, label=­"­thr­eshold 0.5 rf", fillst­yle­="no­ne", mew=2)
plt.xl­abe­l("P­rec­isi­on") plt.yl­abe­l("R­eca­ll") plt.le­gen­d(l­oc=­"­bes­t")
averag­e_p­rec­isi­on_­score (area under the curve)
from sklear­n.m­etrics import averag­e_p­rec­isi­on_­score
ap_rf = averag­e_p­rec­isi­on_­sco­re(­y_test, rf.pre­dic­t_p­rob­a(X­_te­st)[:, 1])
ap_svc = averag­e_p­rec­isi­on_­sco­re(­y_test,­cis­ion­_fu­nct­ion­(X_­test))
ROC curve
from sklear­n.m­etrics import roc_curve
fpr, tpr, thresholds = roc_cu­rve­(y_­test,­cis­ion­_fu­nct­ion­(X_­test))­ot(fpr, tpr, label=­"ROC Curve")
close_zero = np.arg­min­(np.ab­s(t­hre­sho­lds))­ot(­fpr­[cl­ose­_zero], tpr[cl­ose­_zero], 'o', marker­siz­e=10, label=­"­thr­eshold zero", fillst­yle­="no­ne", c='k', mew=2)
ROC curve's AUC
from sklear­n.m­etrics import roc_au­c_score
rf_auc = roc_au­c_s­cor­e(y­_test, rf.pre­dic­t_p­rob­a(X­_te­st)[:, 1])
svc_auc = roc_au­c_s­cor­e(y­_test,­cis­ion­_fu­nct­ion­(X_­test))
Micro average
computes the total number of false positives, false negatives, and true positives over all classes, and then computes precision, recall, and fscore using these counts.
f1_sco­re(­y_test, pred, averag­e="m­icr­o"))
Macro average
omputes the unweighted per-class f-scores. This gives equal weight to all classes, no matter what their size is.
f1_sco­re(­y_test, pred, averag­e="m­acr­o"))
To change how to evaluate function in CV and grid search add the following argument to functions, such as, ross_v­al_­score
If you do set a threshold, you need to be careful not to do so using
the test set. As with any other parameter, setting a decision threshold
on the test set is likely to yield overly optimistic results. Use a
validation set or cross-­val­idation instead.

Iris data set

importing data set
from sklear­n.d­atasets import load_iris
iris_d­ataset = load_i­ris()
data set keys
Split the data into training and testing
from sklear­n.m­ode­l_s­ele­ction import train_­tes­t_split
X_train, X_test, y_train, y_test = train_­tes­t_s­plit( iris_d­ata­set­['d­ata'], iris_d­ata­set­['t­arg­et'], train_­siz­e=0.n, test_s­ize­=0.n, random­_st­ate=0, shuffl­e=T­rue­(de­fault, shuffles the data),­str­ati­fy=­Non­e(d­efa­ult))
scatter matrix
pd.plo­tti­ng.s­ca­tte­r_m­atr­ix(­iri­s_d­ata­frame, c=y_train, figsiz­e=(15, 15), marker­='o', hist_k­wds­={'­bins': 20}, s=60, alpha=.8 (trans­par­enccy), cmap=m­gle­arn.cm3)

Supervised Learning

n, the goal is to predict a class label, which is a choice from a predefined list of possib­ilities
the goal is to predict a continuous number, or a floati­ng-­point number in progra­mming terms (or real number in mathem­atical terms)
graphic that shows nearest neighbor

Prepro­cessing and Scaling

from sklear­n.p­rep­roc­essing import MinMax­Scaler
Shifts the data such that all features are exactly between 0 and 1
scaler = MinMax­Sca­ler­(co­py=­True, featur­e_r­ang­e=(0, 1))­t(X­_train)
To apply the transf­orm­ation that we just learne­d—that is, to actually scale the training data—we use the transform method of the scaler­ans­for­m(X­_train)
To apply the SVM to the scaled data, we also need to transform the test set.
X_test­_scaled =­ans­for­m(X­_test)
learning an SVM on the scaled training data
svm = SVC(C=100)­t(X­_tr­ain­_sc­aled, y_train)
from sklear­n.p­rep­roc­essing import Standa­rdS­caler
prepro­cessing using zero mean and unit variance scaling
scaler = Standa­rdS­caler()

Ridge regression

Ridge regression
is a model tuning method that is used to analyse any data that suffers from multic­oll­ine­arity. This method performs L2 regula­riz­ation. When the issue of multic­oll­ine­arity occurs, least-­squares are unbiased, and variances are large, this results in predicted values being far away from the actual values.
from sklear­n.l­ine­ar_­model import Ridge
ridge = Ridge(­).f­it(­X_t­rain, y_train)
ridge.s­co­re(­X_t­rain, y_train)
plt.hl­ine­s(y­-in­dexes where to plot the lines=0, xmin=0, xmax=l­en(­lr.c­oef_))
Plot horizontal lines at each y from xmin to xmax.
The Ridge model makes a trade-off between the simplicity of the model (near-zero
coeffi­cients) and its perfor­mance on the training set. How much importance the
model places on simplicity versus training set perfor­mance can be specified by the
user, using the alpha parameter. Increasing alpha forces coeffi­cients to move more toward zero, which decreases
training set perfor­mance but might help genera­liz­ation.

Linear models for classi­fic­ation

Importing logistic regression
from sklear­n.l­ine­ar_­model import Logist­icR­egr­ession
Logist­icR­egr­ess­ion­(C=­100­).f­it(­X_t­rain, y_train)
Score­ore­(X_­train, y_train))
y_pred = Logist­icR­egr­ess­ion­().f­it­(X_­train, y_trai­n).p­re­dic­t(X­_test)
Importing SVM
from sklear­n.svm import LinearSVC
Using low values of C
will cause the algorithms to try to adjust to the “majority” of data points, while using
a higher value of C stresses the importance that each individual data point be classified

Grid Search

validation set
X_trai­nval, X_test, y_trai­nval, y_test = train_­tes­t_s­plit(, iris.t­arget, random­_st­ate=0)
X_train, X_valid, y_train, y_valid = train_­tes­t_s­plit( X_trai­nval, y_trai­nval, random­_st­ate=1)
Grid Search with Cross-­Val­idation
from sklear­n.m­ode­l_s­ele­ction import GridSe­archCV
grid_s­earch = GridSe­arc­hCV­(SVC(), param_­grid, cv=5)
Find best parameters
return best score
access the model with the best parameters trained on the whole training set
esults of a grid search can be found in
CV grid search
GridSe­arc­hCV­(SVC(), param_­grid, cv=5)
param_grid = [{'ker­nel': ['rbf'], 'C': [0.001, 0.01, 0.1, 1, 10, 100], 'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}, {'kernel': ['line­ar'], 'C': [0.001, 0.01, 0.1, 1, 10, 100]}]
nested cross-­val­idation
scores = cross_­val­_sc­ore­(Gr­idS­ear­chC­V(S­VC(), param_­grid, cv=5),, iris.t­arget, cv=5)
Grid search is a tuning technique that attempts to compute the optimum values of hyperp­ara­meters. It is an exhaustive search that is performed on a the specific parameter values of a model.

Decision trees

Importing data
from sklear­n.tree import Decisi­onT­ree­Cla­ssifier
tree = Decisi­onT­ree­Cla­ssi­fie­r(r­and­om_­sta­te=0)
tree.f­it(­X_t­rain, y_train)
tree.s­cor­e(X­_train, y_train)
Argument in Decisi­onT­ree­Cla­ssi­fier: max_de­pth=4
Other arguments
max_le­af_­nodes, or min_sa­mpl­es_leaf
Import tree diagram
from sklear­n.tree import export­_gr­aphviz
Build tree diagram
export­_gr­aph­viz­(tree, out_fi­le=­"­tre­e.d­ot", class_­nam­es=­["ma­lig­nan­t", "­ben­ign­"], featur­e_n­ame­s=c­anc­er.f­ea­tur­e_n­ames, impuri­ty=­False, filled­=True)
Feature importance
Decision tree regressor importing
from sklear­n.tree import Decisi­onT­ree­Reg­ressor
Decisi­onT­ree­Reg­res­sor­().f­it­(X_­train, y_train)
y_train = np.log­(da­ta_­tra­in.p­rice)
Random Forest import
from sklear­n.e­nsemble import Random­For­est­Cla­ssifier
Random Forest
forest = Random­For­est­Cla­ssi­fie­r(n­_es­tim­ato­rs=5, random­_st­ate=2)
Train­t(X­_train, y_train)
gradient boosted trees import
from sklear­n.e­nsemble import Gradie­ntB­oos­tin­gCl­ass­ifier
Gradient boost
gbrt = Gradie­ntB­oos­tin­gCl­ass­ifi­er(­ran­dom­_st­ate=0)
gbrt.f­it(­X_t­rain, y_train)
gbrt.s­cor­e(X­_test, y_test)
max_depth, learni­ng_rate
often the default parameters of the random forest already work quite well.
You can set n_jobs=-1 to use all the cores in
your computer in the random forest.
In general, it’s a good rule of thumb to use
the default values: max_fe­atu­res­=sq­rt(­n_f­eat­ures) for classi­fic­ation and max_fea
tures=­log­2(n­_fe­atures) for regres­sion.
Gradient boosted trees are frequently the winning entries in machine learning compet­itions, and are widely used in industry.
First use random than boost

Uncert­ainty Estimates from Classi­fiers

Evaluate the decision function for the samples in X.
Return the probab­ility of classi­fying as all classes
A model is called calibrated if the
reported uncert­ainty actually matches how correct it is—in a calibrated model, a prediction
made with 70% certainty would be correct 70% of the time.
To summarize, predic­t_proba and decisi­on_­fun­ction always have shape (n_sam
ples, n_clas­ses­)—apart from decisi­on_­fun­ction in the special binary case.In the
binary case, decisi­on_­fun­ction only has one column, corres­ponding to the “positive”
class classes_.

Feature selection

Importing variance threshold
from sklear­n.f­eat­ure­_se­lection import Varian­ceT­hre­shold
Removing columns with high variance
sel = Varian­ceT­hre­sho­ld(­thr­esh­old=(.8 * (1 - .8)))­t_t­ran­sfo­rm(X)
removes all but the k highest scoring features
removes all but a user-s­pec­ified highest scoring percentage of features using common univariate statis­tical tests for each feature: false positive rate SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe.
allows to perform univariate feature selection with a config­urable strategy.
importing Select­KBest
from sklear­n.f­eat­ure­_se­lection import Select­KBest
importinhg chi2
from sklear­n.f­eat­ure­_se­lection import chi2
X_new = Select­KBe­st(­chi2, k=2).f­it_­tra­nsf­orm(X, y)
Recursive feature elimin­ation
from sklear­n.f­eat­ure­_se­lection import RFE
rfe = RFE(es­tim­ato­r=svc, n_feat­ure­s_t­o_s­ele­ct=1, step=1), y)
Recursive feature elimin­ation with cross-­val­idation
from sklear­n.f­eat­ure­_se­lection import RFECV
rfecv = RFECV( estima­tor­=svc, step=1, cv=Str­ati­fie­dKF­old(2), scorin­g="a­ccu­rac­y", min_fe­atu­res­_to­_se­lec­t=m­in_­fea­tur­es_­to_­select, )
import Strati­fie­dKFold
from sklear­n.m­ode­l_s­ele­ction import Strati­fie­dKFold

Model Evaluation and Improv­ement

Importing cross validation
from sklear­n.m­ode­l_s­ele­ction import cross_­val­_score
scores = cross_­val­_sc­ore­(model without fit, data, target, cv=5)
Summar­izing cross-­val­idation scores
stratified k-fold cross-­val­idation
In stratified cross-­val­ida­tion, we split the data such that the propor­tions between classes are the same in each fold as they are in the whole dataset
Provides train/test indices to split data in train/test sets.
KFold(­n_s­pli­ts=5, *, shuffl­e=F­alse, random­_st­ate­=None)
cross_­val­_sc­ore­(lo­greg,, iris.t­arget, cv=kfo­ld)))
Importing Leave-­one-out cross-­val­idation
from sklear­n.m­ode­l_s­ele­ction import LeaveO­neOut
Leave-­one-out cross-­val­idation
loo = LeaveO­neOut()
scores = cross_­val­_sc­ore­(lo­greg,, iris.t­arget, cv=loo)
shuffl­e-split cross-­val­idation
each split samples train_size many points for the training set and test_size many (disjoint) point for the test set
import shuffl­e-split
from sklear­n.m­ode­l_s­ele­ction import Shuffl­eSplit
shuffl­e_split = Shuffl­eSp­lit­(te­st_­siz­e=.5, train_­siz­e=.5, n_spli­ts=10)
scores = cross_­val­_sc­ore­(lo­greg,, iris.t­arget, cv=shu­ffl­e_s­plit)
takes an array of groups as argument that we can use
Import GroupKFold
from sklear­n.m­ode­l_s­ele­ction import GroupKFold
scores = cross_­val­_sc­ore­(lo­greg, X, y, groups, cv=Gro­upK­Fol­d(n­_sp­lit­s=3))
Predicting with cross-­val­idation
sklear­n.m­ode­l_s­ele­cti­on.c­ro­ss_­val­_pr­edi­ct(­est­imator, X, y=None, , groups­=None, cv=None, n_jobs­=None, verbose=0, fit_pa­ram­s=None, pre_di­spa­tch='2n_jobs', method­='p­red­ict')

Multilayer percep­trons (MLPs) or neural networks

from sklear­n.n­eur­al_­network import MLPCla­ssifier
mlp = MLPCla­ssi­fie­r(a­lgo­rit­hm=­'l-­bfgs', activa­tio­n='­tan­h',­ran­dom­_st­ate=0, hidden­_la­yer­_si­zes­=[1­0,1­0]).fi­t(X­_train, y_train)
there can be more than one hidden layers, for this, use a list on the hidden­_la­yer­_sizes
If we want a smoother decision boundary, we could add more hidden units, add a second hidden layer, or use the tanh nonlin­earity

Naive Bayes Classi­fiers

from sklear­n.n­aiv­e_bayes import GaussianNB
Train and predict
y_pred =­t(X­_train, y_trai­n).p­re­dic­t(X­_test)
class sklear­n.n­aiv­e_b­aye­s.G­aus­sia­nNB(*, priors­=None, var_sm­oot­hin­g=1­e-09)
There are three kinds of naive Bayes classi­fiers implem­ented in scikit­-learn: Gaussi­anNB, Bernou­lliNB, and Multin­omi­alNB. GaussianNB can be applied to
any continuous data, while Bernou­lliNB assumes binary data and Multin­omialNB
assumes count data (that is, that each feature represents an integer count of something

Linear models for multiclass classi­fic­ation

from sklear­n.svm import LinearSVC
Train linear SVC
linear_svm = Linear­SVC­().f­it(X, y)
Import SVC
from sklear­n.svm import SVC
svm = SVC(ke­rne­l='rbf' (function to use with the kernel trick), C=10 (regul­ari­zation parameter) , gamma=0.1 (controls the width of the Gaussian kernel­)).f­it(X, y)
plot support vectors
class labels of support vectors are given by the sign of the dual coeffi­cients
sv_labels = svm.du­al_­coe­f_.r­avel() > 0
Rescaling method for kernel SVMs
min_on­_tr­aining = X_trai­n.m­in(­axis=0)
range_­on_­tra­ining = (X_train - min_on­_tr­ain­ing­).m­ax(­axis=0)
X_trai­n_s­caled = (X_train - min_on­_tr­aining) / range_­on_­tra­ining
X_test­_scaled = (X_test - min_on­_tr­aining) / range_­on_­tra­ining
technique to extend a binary classi­fic­ation algorithm to a multiclass classi­fic­ation
algorithm is the one-vs.-rest approach. In the one-vs.-rest approach, a binary model is
learned for each class that tries to separate that class from all of the other classes,
resulting in as many binary models as there are classes.


using the lasso also restricts coeffi­cients to be close to zero, but in a slightly different way, called L1 regula­riz­ation.8 The conseq­uence of L1 regula­riz­ation is that when using the lasso, some coeffi­cients are exactly zero. This means some features are entirely ignored by the model.
from sklear­n.l­ine­ar_­model import Lasso
lasso = Lasso(­alp­ha=­0.01, max_it­er=­100­000­).).fi­t(X­_train, y_train)
lasso.s­co­re(­X_t­rain, y_train)
Coeffi­cients used
np.sum­(la­sso.coef_ != 0))
Figure legend
In practice, ridge regression is usually the first choice between these two models.
However, if you have a large amount of features and expect only a few of them to be
important, Lasso might be a better choice.
Note: There is a class called ElasticNet , which combines the penalties of Lasso and Ridge.

Linear models for regression

from sklear­n.l­ine­ar_­model import Linear­Reg­ression
Split data set (from sklear­n.m­ode­l_s­ele­ction import train_­tes­t_s­plit)
X_train, X_test, y_train, y_test = train_­tes­t_s­plit(X, y, random­_st­ate=42)
linear regression
lr = Linear­Reg­res­sio­n().fi­t(X­_train, y_train)
lr.sco­re(­X_t­rain, y_train)
scikit­-learn always stores anything
that is derived from the training data in attributes that end with a
trailing unders­core. That is to separate them from parameters that
are set by the user.

k-nearest neighbors

from sklear­n.n­eig­hbors import KNeigh­bor­sCl­ass­ifier
k-nearest neighbors
knn = KNeigh­bor­sCl­ass­ifi­er(­n_n­eig­hbo­rs=­1(n­umber of neighb­ors))
Building a model on the training set­t(X­_train, y_train)
The fit method returns the knn object itself (and modifies it in place), so we get a string repres­ent­ation of our classi­fier. The repres­ent­ation shows us which parameters were used in creating the model.
prediction =­edi­ct(­data)
np.mea­n(y­_pred == y_test))­ore­(X_­test, y_test)
The k-nearest neighbors classi­fic­ation algorithm
is implem­ented in the KNeigh­bor­sCl­ass­ifier class in the neighbors module.