scikit-learn Cheat Sheet

Jupyter

pip install jupyter	installs jupyter
jupyter notebook	starts jupyter notebook
Creating a notebook	go to new on the upper right and click on python
Run	shift + enter
File menu	can create a new Notebook or open a preexisting one. This is also where you would go to rename a Notebook. I think the most interesting menu item is the Save and Checkpoint option. This allows you to create checkpoints that you can roll back to if you need to.
Edit menu	Here you can cut, copy, and paste cells. This is also where you would go if you wanted to delete, split, or merge a cell. You can reorder cells here too.
View menu	useful for toggling the visibility of the header and toolbar. You can also toggle Line Numbers within cells on or off. This is also where you would go if you want to mess about with the cell’s toolbar.
Insert menu	just for inserting cells above or below the currently selected cell.
Cell menu	allows you to run one cell, a group of cells, or all the cells. You can also go here to change a cell’s type, although the toolbar is more intuitive for that. The other handy feature in this menu is the ability to clear a cell’s output.
Kernel cell	is for working with the kernel that is running in the background. Here you can restart the kernel, reconnect to it, shut it down, or even change which kernel your Notebook is using.
Widgets menu	is for saving and clearing widget state. Widgets are basically JavaScript widgets that you can add to your cells to make dynamic content using Python (or another Kernel).
Help menu	which is where you go to learn about the Notebook’s keyboard shortcuts, a user interface tour, and lots of reference material.
Running tab	will tell you which Notebooks and Terminals you are currently running.
cell types: Code	cell where you write code
cell types: Raw NBConvert	is only intended for special use cases when using the nbconvert command line tool. Basically it allows you to control the formatting in a very specific way when converting from a Notebook to another format.
cell types: Heading	The Heading cell type is no longer supported and will display a dialog that says as much. Instead, you are supposed to use Markdown for your Headings.
cell types: Markdown	Jupyter Notebook supports Markdown, which is a markup language that is a superset of HTML. Next up some of the possible utilities of this type of cell will be shown. Once a markdown cell is written, its text cannot be changed.
	_italic_ or italic
	# Header 1
	## Header 2
	### Header 3
	You can create a list (bullet points) by using dashes, plus signs, or asterisks. There needs to be a space between the marker and the letters. To make sub lists, press tab first
	For inline code highlighting, just surround the code with backticks. If you want to insert a block of code, you can use triple backticks and also specify the programming language:
	` python ... ` in multiple lines
Exporting notebooks	When you are working with Jupyter Notebooks, you will find that you need to share your results with non-technical people. When that happens, you can use the nbconvert tool which comes with Jupyter Notebook to convert or export your Notebook into one of the following formats: HTML, LaTex, PDF, RevealJS, Markdown, ReStructuted Text, Executable script
How to Use nbconvert	Open up a terminal and navigate to the folder that contains the Notebook you wish to convert. The basic conversion command looks like this: jupyter nbconvert <input notebook> --to <output format>. Example: upyter nbconvert py_examples.ipynb --to pdf
	You can also export your currently running Notebook by going to the File menu and choosing the Download as option. This option allows you to download in all the formats that nbconvert supports. However I recommend doing so as you can use nbconvert to export multiple Notebooks at once, which is something that the menu does not support.
Extensions	A Notebook extension (nbextension) is a JavaScript module that you load in most of the views in the Notebook’s frontend.
Where Do I Get Extensions?	You can use Google or search for Jupyter Notebook extensions.
How Do I Install Them?	jupyter nbextension install EXTENSION_NAME
enable an extension after installing it	jupyter nbextension enable EXTENSION_NAME
installing python packages	! pip install package_name --user

If you see a greyed out menu item, try changing the cell’s type and see if the item becomes available to use.

Evaluation Metrics and Scoring

Importing	from sklearn.metrics import confusion_matrix
	confusion = confusion_matrix(y_test, LogisticRegression(C=0.1).fit(X_train, y_train).predict(X_test))
Accuracy	(TP+TN)/(TP+TN+FP+FN)
Precision (positive predictive value)	TP/(TP+FP)
Recall	TP/(TP+FN)
f-score	2*(precision-recall)/(precision+recall)
Importing f-score	from sklearn.metrics import f1_score
f1_score	f1_score(y_test, pred_most_frequent)))
Importing classification report	from sklearn.metrics import classification_report
	classification_report(y_test, model, target_names=["not nine", "nine"]))
Prediction threshold	y_pred_lower_threshold = svc.decision_function(X_test) > -.8
Classification report	classification_report(y_test, y_pred_lower_threshold)
Importing precison_recall_curve	from sklearn.metrics import precision_recall_curve
using the curve	precision, recall, thresholds = precision_recall_curve( y_test, svc.decision_function(X_test))
find threshold closest to zero	close_zero = np.argmin(np.abs(thresholds))
	plt.plot(precision[close_zero], recall[close_zero], 'o', markersize=10, label="threshold zero", fillstyle="none", c='k', mew=2)
for random forest	precision_rf, recall_rf, thresholds_rf = precision_recall_curve( y_test, rf.predict_proba(X_test)[:, 1])
	plt.plot(precision_rf[close_default_rf], recall_rf[close_default_rf], '^', c='k', markersize=10, label="threshold 0.5 rf", fillstyle="none", mew=2)
	plt.xlabel("Precision") plt.ylabel("Recall") plt.legend(loc="best")
average_precision_score (area under the curve)	from sklearn.metrics import average_precision_score
	ap_rf = average_precision_score(y_test, rf.predict_proba(X_test)[:, 1])
	ap_svc = average_precision_score(y_test, svc.decision_function(X_test))
ROC curve	from sklearn.metrics import roc_curve
	fpr, tpr, thresholds = roc_curve(y_test, svc.decision_function(X_test))
	plt.plot(fpr, tpr, label="ROC Curve")
	close_zero = np.argmin(np.abs(thresholds))
	plt.plot(fpr[close_zero], tpr[close_zero], 'o', markersize=10, label="threshold zero", fillstyle="none", c='k', mew=2)
ROC curve's AUC	from sklearn.metrics import roc_auc_score
	rf_auc = roc_auc_score(y_test, rf.predict_proba(X_test)[:, 1])
	svc_auc = roc_auc_score(y_test, svc.decision_function(X_test))
Micro average	computes the total number of false positives, false negatives, and true positives over all classes, and then computes precision, recall, and fscore using these counts.
	f1_score(y_test, pred, average="micro"))
Macro average	omputes the unweighted per-class f-scores. This gives equal weight to all classes, no matter what their size is.
	f1_score(y_test, pred, average="macro"))
To change how to evaluate function in CV and grid search add the following argument to functions, such as, ross_val_score	scoring="accuracy"

If you do set a threshold, you need to be careful not to do so using
the test set. As with any other parameter, setting a decision threshold
on the test set is likely to yield overly optimistic results. Use a
validation set or cross-validation instead.

Iris data set

importing data set	from sklearn.datasets import load_iris
	iris_dataset = load_iris()
data set keys	(iris_dataset.keys()
Split the data into training and testing	from sklearn.model_selection import train_test_split
	X_train, X_test, y_train, y_test = train_test_split( iris_dataset['data'], iris_dataset['target'], train_size=0.n, test_size=0.n, random_state=0, shuffle=True(default, shuffles the data),stratify=None(default))
scatter matrix	pd.plotting.scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15), marker='o', hist_kwds={'bins': 20}, s=60, alpha=.8 (transparenccy), cmap=mglearn.cm3)

Supervised Learning

classification	n, the goal is to predict a class label, which is a choice from a predefined list of possibilities
regression	the goal is to predict a continuous number, or a floating-point number in programming terms (or real number in mathematical terms)
graphic that shows nearest neighbor	mglearn.plots.plot_knn_classification(n_neighbors=1)

Preprocessing and Scaling

Importing	from sklearn.preprocessing import MinMaxScaler
Shifts the data such that all features are exactly between 0 and 1	scaler = MinMaxScaler(copy=True, feature_range=(0, 1))
	scaler.fit(X_train)
To apply the transformation that we just learned—that is, to actually scale the training data—we use the transform method of the scaler	scaler.transform(X_train)
To apply the SVM to the scaled data, we also need to transform the test set.	X_test_scaled = scaler.transform(X_test)
learning an SVM on the scaled training data	svm = SVC(C=100)
	svm.fit(X_train_scaled, y_train)
Importing	from sklearn.preprocessing import StandardScaler
preprocessing using zero mean and unit variance scaling	scaler = StandardScaler()

Ridge regression

Ridge regression	is a model tuning method that is used to analyse any data that suffers from multicollinearity. This method performs L2 regularization. When the issue of multicollinearity occurs, least-squares are unbiased, and variances are large, this results in predicted values being far away from the actual values.
Importing	from sklearn.linear_model import Ridge
Train	ridge = Ridge().fit(X_train, y_train)
R^2	ridge.score(X_train, y_train)
plt.hlines(y-indexes where to plot the lines=0, xmin=0, xmax=len(lr.coef_))	Plot horizontal lines at each y from xmin to xmax.

The Ridge model makes a trade-off between the simplicity of the model (near-zero
coefficients) and its performance on the training set. How much importance the
model places on simplicity versus training set performance can be specified by the
user, using the alpha parameter. Increasing alpha forces coefficients to move more toward zero, which decreases
training set performance but might help generalization.

Linear models for classification

Importing logistic regression	from sklearn.linear_model import LogisticRegression
Train	LogisticRegression(C=100).fit(X_train, y_train)
Score	logreg.score(X_train, y_train))
Predict	y_pred = LogisticRegression().fit(X_train, y_train).predict(X_test)
Importing SVM	from sklearn.svm import LinearSVC

Using low values of C
will cause the algorithms to try to adjust to the “majority” of data points, while using
a higher value of C stresses the importance that each individual data point be classified
correctly.

Grid Search

validation set	X_trainval, X_test, y_trainval, y_test = train_test_split( iris.data, iris.target, random_state=0)
	X_train, X_valid, y_train, y_valid = train_test_split( X_trainval, y_trainval, random_state=1)
Grid Search with Cross-Validation	from sklearn.model_selection import GridSearchCV
Trainning	grid_search = GridSearchCV(SVC(), param_grid, cv=5)
Find best parameters	grid_search.best_params_
return best score	grid_search.best_score_
best_estimator_	access the model with the best parameters trained on the whole training set
esults of a grid search can be found in	grid_search.cv_results_
CV grid search	GridSearchCV(SVC(), param_grid, cv=5)
	param_grid = [{'kernel': ['rbf'], 'C': [0.001, 0.01, 0.1, 1, 10, 100], 'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}, {'kernel': ['linear'], 'C': [0.001, 0.01, 0.1, 1, 10, 100]}]
nested cross-validation	scores = cross_val_score(GridSearchCV(SVC(), param_grid, cv=5), iris.data, iris.target, cv=5)

Grid search is a tuning technique that attempts to compute the optimum values of hyperparameters. It is an exhaustive search that is performed on a the specific parameter values of a model.

Decision trees

Importing data	from sklearn.tree import DecisionTreeClassifier
Tree	tree = DecisionTreeClassifier(random_state=0)
Train	tree.fit(X_train, y_train)
Score	tree.score(X_train, y_train)
Pre-prunning	Argument in DecisionTreeClassifier: max_depth=4
Other arguments	max_leaf_nodes, or min_samples_leaf
Import tree diagram	from sklearn.tree import export_graphviz
Build tree diagram	export_graphviz(tree, out_file="tree.dot", class_names=["malignant", "benign"], feature_names=cancer.feature_names, impurity=False, filled=True)
Feature importance	tree.feature_importances_
Predict	tree.predict(X_all)
Decision tree regressor importing	from sklearn.tree import DecisionTreeRegressor
Train	DecisionTreeRegressor().fit(X_train, y_train)
log	y_train = np.log(data_train.price)
exponential	np.exp(pred_tree)
Random Forest import	from sklearn.ensemble import RandomForestClassifier
Random Forest	forest = RandomForestClassifier(n_estimators=5, random_state=2)
Train	forest.fit(X_train, y_train)
gradient boosted trees import	from sklearn.ensemble import GradientBoostingClassifier
Gradient boost	gbrt = GradientBoostingClassifier(random_state=0)
Train	gbrt.fit(X_train, y_train)
Score	gbrt.score(X_test, y_test)
Arguments	max_depth, learning_rate

often the default parameters of the random forest already work quite well.
You can set n_jobs=-1 to use all the cores in
your computer in the random forest.
In general, it’s a good rule of thumb to use
the default values: max_features=sqrt(n_features) for classification and max_fea
tures=log2(n_features) for regression.
Gradient boosted trees are frequently the winning entries in machine learning competitions, and are widely used in industry.
First use random than boost

Uncertainty Estimates from Classifiers

Evaluate the decision function for the samples in X.	model.decision_function(X_test)[:6]
Return the probability of classifying as all classes	model.predict_proba(X_test[:6])

A model is called calibrated if the
reported uncertainty actually matches how correct it is—in a calibrated model, a prediction
made with 70% certainty would be correct 70% of the time.
To summarize, predict_proba and decision_function always have shape (n_sam
ples, n_classes)—apart from decision_function in the special binary case.In the
binary case, decision_function only has one column, corresponding to the “positive”
class classes_.

Feature selection

Importing variance threshold	from sklearn.feature_selection import VarianceThreshold
Removing columns with high variance	sel = VarianceThreshold(threshold=(.8 * (1 - .8)))
	sel.fit_transform(X)
SelectKBest	removes all but the k highest scoring features
SelectPercentile	removes all but a user-specified highest scoring percentage of features using common univariate statistical tests for each feature: false positive rate SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe.
GenericUnivariateSelect	allows to perform univariate feature selection with a configurable strategy.
importing SelectKBest	from sklearn.feature_selection import SelectKBest
importinhg chi2	from sklearn.feature_selection import chi2
	X_new = SelectKBest(chi2, k=2).fit_transform(X, y)
Recursive feature elimination	from sklearn.feature_selection import RFE
	rfe = RFE(estimator=svc, n_features_to_select=1, step=1)
	rfe.fit(X, y)
Recursive feature elimination with cross-validation	from sklearn.feature_selection import RFECV
	rfecv = RFECV( estimator=svc, step=1, cv=StratifiedKFold(2), scoring="accuracy", min_features_to_select=min_features_to_select, )
import StratifiedKFold	from sklearn.model_selection import StratifiedKFold

Model Evaluation and Improvement

Importing cross validation	from sklearn.model_selection import cross_val_score
Cross-validation	scores = cross_val_score(model without fit, data, target, cv=5)
Summarizing cross-validation scores	scores.mean()
stratified k-fold cross-validation	In stratified cross-validation, we split the data such that the proportions between classes are the same in each fold as they are in the whole dataset
Provides train/test indices to split data in train/test sets.	KFold(n_splits=5, *, shuffle=False, random_state=None)
	cross_val_score(logreg, iris.data, iris.target, cv=kfold)))
Importing Leave-one-out cross-validation	from sklearn.model_selection import LeaveOneOut
Leave-one-out cross-validation	loo = LeaveOneOut()
	scores = cross_val_score(logreg, iris.data, iris.target, cv=loo)
shuffle-split cross-validation	each split samples train_size many points for the training set and test_size many (disjoint) point for the test set
import shuffle-split	from sklearn.model_selection import ShuffleSplit
	shuffle_split = ShuffleSplit(test_size=.5, train_size=.5, n_splits=10)
	scores = cross_val_score(logreg, iris.data, iris.target, cv=shuffle_split)
takes an array of groups as argument that we can use	GroupKFold
Import GroupKFold	from sklearn.model_selection import GroupKFold
	scores = cross_val_score(logreg, X, y, groups, cv=GroupKFold(n_splits=3))
Predicting with cross-validation	sklearn.model_selection.cross_val_predict(estimator, X, y=None, , groups=None, cv=None, n_jobs=None, verbose=0, fit_params=None, pre_dispatch='2n_jobs', method='predict')

Multilayer perceptrons (MLPs) or neural networks

Importing	from sklearn.neural_network import MLPClassifier
Train	mlp = MLPClassifier(algorithm='l-bfgs', activation='tanh',random_state=0, hidden_layer_sizes=[10,10]).fit(X_train, y_train)

there can be more than one hidden layers, for this, use a list on the hidden_layer_sizes
If we want a smoother decision boundary, we could add more hidden units, add a second hidden layer, or use the tanh nonlinearity

Naive Bayes Classifiers

Importing	from sklearn.naive_bayes import GaussianNB
Train and predict	y_pred = gnb.fit(X_train, y_train).predict(X_test)
Function	class sklearn.naive_bayes.GaussianNB(*, priors=None, var_smoothing=1e-09)

There are three kinds of naive Bayes classifiers implemented in scikit-learn: GaussianNB, BernoulliNB, and MultinomialNB. GaussianNB can be applied to
any continuous data, while BernoulliNB assumes binary data and MultinomialNB
assumes count data (that is, that each feature represents an integer count of something

Linear models for multiclass classification

Importing	from sklearn.svm import LinearSVC
Train linear SVC	linear_svm = LinearSVC().fit(X, y)
Import SVC	from sklearn.svm import SVC
train	svm = SVC(kernel='rbf' (function to use with the kernel trick), C=10 (regularization parameter) , gamma=0.1 (controls the width of the Gaussian kernel)).fit(X, y)
plot support vectors	sv= svm.support_vectors_
class labels of support vectors are given by the sign of the dual coefficients	sv_labels = svm.dual_coef_.ravel() > 0
Rescaling method for kernel SVMs	min_on_training = X_train.min(axis=0)
	range_on_training = (X_train - min_on_training).max(axis=0)
	X_train_scaled = (X_train - min_on_training) / range_on_training
	X_test_scaled = (X_test - min_on_training) / range_on_training

common
technique to extend a binary classification algorithm to a multiclass classification
algorithm is the one-vs.-rest approach. In the one-vs.-rest approach, a binary model is
learned for each class that tries to separate that class from all of the other classes,
resulting in as many binary models as there are classes.

Lasso

Lasso	using the lasso also restricts coefficients to be close to zero, but in a slightly different way, called L1 regularization.8 The consequence of L1 regularization is that when using the lasso, some coefficients are exactly zero. This means some features are entirely ignored by the model.
Importing	from sklearn.linear_model import Lasso
Train	lasso = Lasso(alpha=0.01, max_iter=100000).).fit(X_train, y_train)
R^2	lasso.score(X_train, y_train)
Coefficients used	np.sum(lasso.coef_ != 0))
Figure legend	plt.legend()

In practice, ridge regression is usually the first choice between these two models.
However, if you have a large amount of features and expect only a few of them to be
important, Lasso might be a better choice.
Note: There is a class called ElasticNet , which combines the penalties of Lasso and Ridge.

Linear models for regression

Importing	from sklearn.linear_model import LinearRegression
Split data set (from sklearn.model_selection import train_test_split)	X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
linear regression	lr = LinearRegression().fit(X_train, y_train)
slope	lr.coef_
interception	lr.intercept_
R^2	lr.score(X_train, y_train)

scikit-learn always stores anything
that is derived from the training data in attributes that end with a
trailing underscore. That is to separate them from parameters that
are set by the user.

k-nearest neighbors

Importing	from sklearn.neighbors import KNeighborsClassifier
k-nearest neighbors	knn = KNeighborsClassifier(n_neighbors=1(number of neighbors))
Building a model on the training set	knn.fit(X_train, y_train)
	The fit method returns the knn object itself (and modifies it in place), so we get a string representation of our classifier. The representation shows us which parameters were used in creating the model.
Predictions	prediction = knn.predict(data)
Accuracy	np.mean(y_pred == y_test))
	knn.score(X_test, y_test)

The k-nearest neighbors classification algorithm
is implemented in the KNeighborsClassifier class in the neighbors module.

scikit-learn Cheat Sheet (DRAFT) by Remidy08

Jupyter

Evaluation Metrics and Scoring

Iris data set

Supervised Learning

Preprocessing and Scaling

Ridge regression

Linear models for classification

Grid Search

Decision trees

Uncertainty Estimates from Classifiers

Feature selection

Model Evaluation and Improvement

Multilayer perceptrons (MLPs) or neural networks

Naive Bayes Classifiers

Linear models for multiclass classification

Lasso

Linear models for regression

k-nearest neighbors

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

scikit-learn Cheat Sheet (DRAFT) by Remidy08

Jupyter

Evaluation Metrics and Scoring

Iris data set

Supervised Learning

Prepro­cessing and Scaling

Ridge regression

Linear models for classi­fic­ation

Grid Search

Decision trees

Uncert­ainty Estimates from Classi­fiers

Feature selection

Model Evaluation and Improv­ement

Multilayer percep­trons (MLPs) or neural networks

Naive Bayes Classi­fiers

Linear models for multiclass classi­fic­ation

Lasso

Linear models for regression

k-nearest neighbors

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

Preprocessing and Scaling

Linear models for classification

Uncertainty Estimates from Classifiers

Model Evaluation and Improvement

Multilayer perceptrons (MLPs) or neural networks

Naive Bayes Classifiers

Linear models for multiclass classification