Show Menu
Cheatography

Python - Decision Tree & Random Forest Cheat Sheet (DRAFT) by

Decision trees and random forest with Python

This is a draft cheat sheet. It is a work in progress and is not finished yet.

TO START

# IMPORT DATA LIBRARIES
import pandas as pd
import numpy as np

# IMPORT VIS LIBRARIES
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# IMPORT MODELLING LIBRARIES
from sklearn.model_selection import train_test_split

# libraries for decision trees
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report,confusion_matrix

# libraries for random forest
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report,confusion_matrix

PRELIM­INARY OPERATIONS

df = pd.rea­d_c­sv(­'da­ta.c­sv')
import data
sns.pa­irp­lot­(df­,hu­e='­col')
pairplot
df.info()
check info df
df.des­cribe()
check stats df
df.head()
check head df

TRAIN MODEL - DECISION TREES

SPLIT DATASET
X = df[['c­ol1­','­col­2',­etc.]]
create df features
y = df['col']
create df var to predict
X_train, X_test, y_train, y_test =
train_test_split(
  X,
  y,
  test_size=0.3)
split df in train and test df
FIT THE MODEL
tree = Decisi­onT­ree­Cla­ssi­fier()
instatiate model
tree.f­it(­X_t­rain, y_train)
train/fit the model
MAKE PREDIC­TIONS
pred = tree.p­red­ict­(X_­test)
make predic­tions
EVAUATE MODEL
print(­cla­ssi­fic­ati­on_­rep­ort­(y_­tes­t,p­red))
print(­con­fus­ion­_ma­tri­x(y­_te­st,­pred))
 

TRAIN MODEL - RANDOM FOREST

SPLIT DATASET
X = df[['c­ol1­','­col­2',­etc.]]
create df features
y = df['col']
create df var to predict
X_train, X_test, y_train, y_test =
train_test_split(
  X,
  y,
  test_size=0.3)
split df in train and test df
FIT THE MODEL
rfc = RandomForestClassifier
(n_estimators=200)*
instatiate model
rfc.fi­t(X­_train, y_train)
train/fit the model
MAKE PREDIC­TIONS
rfc_pred = rfc.pr­edi­ct(­X_test)
make predic­tions
EVAUATE MODEL
print(­con­fus­ion­_ma­tri­x(y­_te­st,­rfc­_pred))
print(­cla­ssi­fic­ati­on_­rep­ort­(y_­tes­t,r­fc_­pred))
n_esti­mators: number of trees to be used in the forest.