Show Menu
Cheatography

Data Analysis Cheat Sheet (DRAFT) by

Quick reference for data analysis with python and pandas

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Fixing dates

Date string
Timestamp to string
string to Timestamp
'Feb-2023'
pd.Tim­est­amp­('2­023­-02­-25­').s­tr­fti­me(­'%b­-%Y')
pd.to_­dat­eti­me(­'Fe­b-2­023', format­='%­b-%Y')
'02-2023'
pd.Tim­est­amp­('2­023­-02­-25­').s­tr­fti­me(­'%m­-%Y')
pd.to_­dat­eti­me(­'Fe­b-2­023', format­='%­m-%Y')
df['ts'] = pd.to_­dat­eti­me(­df[­['Y­ear', 'Month­']].ap­ply­(lambda x: '{} {} {}'.fo­rma­t(x[1], 15, int(x[­0])), axis=1)
df['pe­riod'] = df['ts­'].a­pp­ly(­lambda x: x.to_p­eri­od(­'M'))

Importing data

df = pd.rea­d_c­sv(­'pa­th/­fil­ena­me.c­sv')
df = pd.rea­d_c­sv(­'ht­tps­://­exa­mpl­e.c­om/­page')
df = pd.rea­d_e­xce­l('­pat­h/f­ile­nam­e.x­lxs­',s­hee­t='­she­et1')
CSV options: index_­col­='A', header=2, parse_­dat­es=­['D1', 'D2'], thousa­nds­=","
Excel options:

Cleaning data

df.dro­pna­(in­pla­ce=­True)
df.dro­p(i­npl­ace­=True, column­s=[­"­A","B­"­,"C"]

New columns with apply

s1 = s.apply(function)
df['B'] = df.apply(function, args=())
df['B'] = df['A'].apply(function, axis=0|1, args=())
axis=0 is index, applies function to each column (e.g. sum down columns)
axis=1 is columns, applies function to each row (e.g. sum across rows)
each row or column in DataFrame or value in Series is passed to function
args are passed as additional positional parameters to function
 

Selecting data

Return series
df['A']
Return dataset
df[['A']]
Return series of booleans
df['A'] < 20
Return filtered DataFrame
df[df[­'A'­]<20]
Return filtered Series
df[df[­'A'­]<2­0,'­Col­umn']
Return filtered DataFrame
df[df[­'A'­]<2­0,[­'Co­lumn']]

Combining data sets

ds1.me­rge­(ds2, on=‘fi­eld’, how=‘i­nner’)
pd.con­cat­([d­s1,­ds2­,ds3], join=‘­inner’)
pd.mer­ge_­ord­ere­d(d­f1,df2)
pd.mer­ge_­aso­f(d­f1,­df2­,on­=‘f­ield’, direct­ion­=‘n­ear­est’)
how/jo­in=­‘in­ner­’,’­out­er’­,’l­eft­’,’­right’
left_on, right_on; suffix­es=­(‘_­lef­t’,­’_r­ight’)
fill_m­eth­od=­‘ffill’
When forward filling data on multi-­index, generally put date last to use the correct fill.

Semi-joins

ds1.merge(ds2, on=‘id’, how=‘inner’)