Show Menu
Cheatography

Python - K-Nearest_Neighbors(KNN) model Cheat Sheet (DRAFT) by

This is a draft cheat sheet. It is a work in progress and is not finished yet.

TO START

# IMPORT DATA LIBRARIES
import numpy as np
import pandas as pd

# IMPORT VIS LIBRARIES
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# IMPORT MODELLING LIBRARIES
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report,confusion_matrix

PRELIM­INARY OPERATIONS

df = pd.rea­d_c­sv(­'da­ta.c­sv')
read data
STANDA­RDISE THE VARIABLES
scaler = Standa­rdS­caler()
scaler.fi­t(d­f.d­rop­('y­',a­xis=1))
scaled­_feat = scaler.tr­ans­for­m(d­f.d­rop­('y­',a­xis=1))
df_new­=pd.Da­taF­ram­e(s­cal­ed_­fea­t,c­olu­mns­=df.co­lum­ns[­:-1])*
df.col­umn­s[:-1]: means take all the columns but the last one.

TRAIN MODEL

CREATE X and y
X = df[['c­ol1­','­col­2',­etc.]]
create df features
y = df['col']
create df var to predict
SPLIT DATASET
X_train, X_test, y_train, y_test =
train_test_split(
  X,
  y,
  test_size=0.3)
split df in train and test df
FIT THE MODEL
knn = KNeigh­bor­sCl­ass­ifi­er(­n_n­eig­hbo­rs=1)*
knn.fi­t(X­_tr­ain­,y_­train)
train/fit the model
MAKE PREDIC­TIONS
pred = knn.pr­edi­ct(­X_test)
make predic­tions
n_neig­hbors=1: we start specifying K = 1 and then we see how to better choose the K value (see evaluate block in this cheat sheet).
 

EVALUATION of the MODEL

EVAUATE MODEL
print(­con­fus­ion­_ma­tri­x(y­_te­st,­pred))
print(­cla­ssi­fic­ati­on_­rep­ort­(y_­tes­t,p­red))
CHOOSING BETTER K
error_rate = []*
create an empty list
for i in range(1,40):
 knn = KNeighborsClassifier(n_neighbors=i)
 knn.fit(X_train,y_train)
 pred_i = knn.predict(X_test)
 error_rate.append(np.mean(pred_i != y_test))
ELBOW PLOT
plt.figure(figsize=(10,6))
plt.plot(range(1,40),error_rate)
plt.title('Error Rate vs. K Value')
plt.xlabel('K')
plt.ylabel('Error Rate')
Now we choose the K value where the error starts to reduce and flatten and we repeat the model fitting and evalua­tion! Theore­tic­ally, you should obtain better results.
Explan­ation:
1. we create an empty list.
2. we loop for a certain range of possible K values, here 1 to 40.
3. we create and fit the KNN model with these different K values.
4. we predict the using these models
5. we calculate the mean of the error of all these models and store the errors in the empty list of point 1. We will then plot these errors to see what K values could be the best one.