Show Menu
Cheatography

Python - Logistic Regression Model Cheat Sheet (DRAFT) by

Logistic regression model in Python

This is a draft cheat sheet. It is a work in progress and is not finished yet.

TO START

# IMPORT DATA LIBRARIES
import pandas as pd
import numpy as np

# IMPORT VIS LIBRARIES
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# IMPORT MODELLING LIBRARIES
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

PRELIM­INARY OPERATIONS

df.pd.r­ea­d_c­sv(­'da­ta.c­sv')
read da
df.head()
check head df
df.info()
check info df
df.des­cribe()
check stats df
df.columns
check col names

VISUALISE DATA

sns.he­atm­ap(­df.i­sn­ull())*
check null values
sns.se­t_s­tyl­e('­whi­teg­rid')
set different style
sns.co­unt­plo­t('­col­',df)
countplot
sns.countplot('col',df,palette='')
countplot
sns.countplot('col',df,hue='',palette='')
countplot
sns.di­stp­lot­(df­['c­ol'­].d­rop­na(­),b­ins=30)
distri­bution plot
sns.he­atmap(): can take more useful parame­ters;
yticklabels=False,cbar=False,cmap='viridis'

DATA CLEANING

create a person­alise function*
impute values
apply the person­alised function*
apply function
dummy_var =
pd.get­_du­mmi­es(­df[­'co­l']­,dr­op_­fir­st=­True)*
convert catego­rical features
df.dro­p([­'ol­d.c­ol1­',...''])
drop old columns
df= pd.con­cat­([d­umm­y_v­ar]­,ax­is=1)
add the new dummy var into the df
See imputing and apply section.
drop.f­irs­t=True: without this command, we would have two specular columns, leading to issues of multic­oll­ine­arity.
 

IMPUTING AND APPLY

# EXAMPLE OF A POSSIBLE FUNCTION TO IMPUTE MISSING VALUES
def impute_age(cols):
     Age = cols[0]
     Pclass = cols[1]
 
     if pd.isnull(Age):
          if Pclass == 1:
             return 37
          elif Pclass == 2:
             return 29
          else:
             return 24
     else:
         return Age

# EXAMPLE OF HOW TO APPLY THE FUNCTION
train['Age'] = rain[['Age','Pclass']].apply(impute_age,axis=1)
You can impute using mean, median, etc. If you are interested in using Bayesian Estima­tion, you can see here:
https:­//g­ith­ub.c­om­/je­wei­nbe­rg/­Pan­das­-MICE or
https:­//p­ypi.py­tho­n.o­rg/­pyp­i/f­anc­yimpute

TRAIN and EVALUATE MODEL

CREATE X and y
X = df[['c­ol1­','­col­2',­etc.]]
create df features
y = df['col']
create df var to predict
SPLIT DATASET
X_train, X_test, y_train, y_test =
train_test_split(
  X,
  y,
  test_size=0.3)
split df in train
and test df
FIT THE MODEL
log = Logist­icR­egr­ess­ion()
instatiate model
log.fi­t(X­_tr­ain­,y_­train)
train/fit the model
MAKE PREDIC­TIONS
predic­tions = log.pr­edi­ct(­X_test)
make predic­tions
EVAUATE MODEL
print(classification_report(y_test,predictions))
useful measures
confusion_matrix(y_test, predictions)