Cheatography https://cheatography.com

Download This Cheat Sheet (PDF)

Comments
Rating: ()

Machine Learning in R and Python Cheat Sheet (DRAFT) by jenniferfjy

Data Processing and Machine Learning in R and Python

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Introduction

This cheat sheet provides a comparison between basic data processing technique as well as machine learning models in both R and Python.

Documentations

https://scikit-learn.org/stable/auto_examples/index.html

https://seaborn.pydata.org/

https://cran.r-project.org/web/packages/rpart/index.html

https://cran.r-project.org/web/packages/caret/index.html

https://cran.r-project.org/web/packages/randomForest/index.html

https://www.rdocumentation.org/packages/stats/versions/3.6.2

Load dataset in R

library(datasets)	Import packages
data(iris)	Load dataset
head(iris)	Look up the first 6 rows of the dataset
summary(iris)	Get summary statistics of each columns
names(iris)	Get the column names

Data preprocessing in R

scaling = preProcess(data, method = c('center', 'scale'))	Create scaling based on data
data_scaled = predict(scaling, data)	Apply scaling to data
train_partition = createDataPartition(y, p = 0.8, list = FALSE)	Balanced splitting based on the outcome ( 80/20 split)
data_train = data[train_partition,]	Split data into train and test sets
data_test = data[-train_partition,]	Split data into train and test sets

Supervised learning models in R

model = lm(data, y ~ x)	Simple linear regression
model = lm(data, y ~ x1 + x2 + x3)	Multiple linear regression
summary(model)	Print summary statistics from linear model
predictions = predict(object, newdata)	Make prediction based on the model object
model = glm(data, y ~ x1 + x2 + x3, family = 'binomial')	Logistic regression
model = svm(data, y ~ x1 + x2 + x3, params)	Support vector machines (SVM)
model = rpart(data, y ~ x1 + x2 + x3, params)	Decision trees
model = randomForest(data, y ~ x1 + x2 + x3, params)	Random forest
data_xgb = xgb.DMatrix(data, label)	Transform the data into DMatrix format
model = xgb.train(data_xgb, label, params)	Gradient boosting models
predictions = knn(train, test, cl, params)	k-NN with labels cl and parameters (e.g., number of neighbors)

Unsupervised learning models

model = kmeans(x, params)	K-Means clustering
model = prcomp(x, params)	Principal components analysis (PCA)

Model performance in R

RMSE(pred, actual)	Root mean square error
R2(pred, actual, form = 'traditional' )	Proportion of the variance explained by the model
mean(actual == pred)	Accuracy (how accurate positive predictions are)
confusionMatrix(actual, pred)	Confusion matrix
auc(actual, pred)	Area under the ROC curve
f1Score(actual, pred)	Harmonic mean of precision and recall

Data visualization in R

geom_point(x, y, color, size, fill, alpha)	Scatter plot
geom_line(x, y, color, size, fill, alpha, linetype)	Line plot
geom_bar(x, y, color, size, fill, alpha)	Bar chart
geom_boxplot(x, y, color)	Box plot
geom_tile(x, y, color, fill)	Heatmap

Import file in Python

import pandas as pd	Import package
df = pd.read_csv()	Read csv files
df.head(n)	Look up the first n rows of the dataset
df.describe()	Get summary statistics of each columns
df.columns	Get column names

Data Processing in Python

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)	Split the dataset into training (80%) and test (20%) sets
scaler = StandardScaler()	Standardize features by removing the mean and scaling to unit variance
X_train = scaler.fit_transform(X_train)	Fit and transform scalar on X_train
X_test = scaler.transform(X_test)	Transform X_test

Supervised learning models in Python

model = LinearRegression()	Linear regression
model.fit(X_train, y_train)	Fit linear model
model.predict(X_test)	Predict using the linear model
LogisticRegression().fit(X_train, y_train)	Logistic regression
LinearSVC.fit(X_train, y_train)	Train primal SVM
SVC().fit(X_train, y_train)	Train dual SVM
DecisionTreeClassifier().fit(X_train, y_train)	Decision tree classifier
RandomForestClassifier().fit(X_train, y_train)	Random forest classifier
GradientBoostingClassifier().fit(X_train, y_train)	Gradient boosting for classification
XGBClassifier().fit(X_train, y_train)	XGboost classifier
KNeighborsClassifier().fit(X_train, y_train)	k-NN

Unsupervised learning models

KMeans().fit(X)	K-Means clustering
PCA().fit(X)	Principal component analysis (PCA)

Model performance in Python

metrics.mean_squared_error(y_true, y_pred, squared=False)	Root mean squared error
metrics.r2_score(y_true, y_pred)	Proportion of the variance explained by the model
metrics.confusion_matrix(y_true, y_pred)	Confusion matrix
metrics.accuracy_score(y_true, y_pred)	Accuracy classification score
metrics.roc_auc_score()	Compute ROC-AUC from prediction scores
f1_score(y_true, y_pred, average='macro')	Harmonic mean of the precision and recall

Data visualization in Python

sns.scatterplot(x, y, hue, size)	Scatter plot
sns.lineplot(x, y, hue, size)	Line plot
sns.barplot(x, y, hue)	Bar chart
sns.boxplot(x, y, hue)	Box plot
sns.heatmap(data, linecolor, linewidth)	Heatmap

Download the Machine Learning in R and Python Cheat Sheet

2 Pages

PDF (recommended)

PDF (2 pages)

Alternative Downloads

Latest Cheat Sheet

1 Page

(0)

G2 POA Ch1 - Introduction to Accounting Cheat Sheet

for my own study but sharing my notes as well!

27 Jun 25

-secsch, -singapore, -normalacademic, -na, -gcsesingapore and 4 more ...

Random Cheat Sheet

1 Page

(2)

Keymap Cheat Sheet

My own keymap build

17 Jan 15, updated 11 May 16

English, русский язык (Russian)

About Cheatography

Cheatography is a collection of 6764 cheat sheets and quick references in 25 languages for everything from French to programming!

Behind the Scenes

If you have any problems, or just want to say hi, you can find us right here:

Recent Cheat Sheet Activity

essentialist published Urban Backpack Needs.
18 hours 10 mins ago

shinen8754 published G2 POA Ch1 - Introduction to Accounting.
3 days 22 hours ago

RRos published Grow A Garden Roblox[Pets].
5 days, 1 hour ago

charlotte.aax published Ancient Greek Language.
1 week, 1 day ago

Tamaranth updated Basic Cisco IOS Commands.
1 week 3 days ago

© 2011 - 2025 Cheatography.com | CC License | Terms | Privacy

Latest Cheat Sheets RSS Feed