Show Menu
Cheatography

Python Pandas Cheat Sheet (DRAFT) by

This is a draft cheat sheet. It is a work in progress and is not finished yet.

STOPPED FILLING AT LECTURE 6 LINE 122

 

Basics

%matpl­otlib inline
plots into notebook
df = pd.rea­d_csv(path, index_col ='name')
loads dataframe
df.head()
df.tail()
df.values
df.plot()
df.plot(style='.')
df.index
returns row indexes
df[col].loc[index]
returns value with given column and index
df.loc[:, 'col_n+1'] = x
referring to a col that doesn't exist creates a new one
 

Basic Dataframe Analysis

df.isn­ull()
returns bool
df[.is­nul­l().sum
returns sum of trues
df.isn­ull­().any
checks whether there is a true
df[col].max()
df[col].min()
df[col].idxmax()
df[col].idxmin()
df[col].median()
df[col].mean()
df[col].describe()
gives statistic analysis
df[col].quanti­le(.5)
50% quantile
df.box­plot(by = 'col')
boxplot grouped by column
df.his­t(bins = 20)
histogram in 20 bars
df.plo­t.s­cat­ter(x = 'name', y = 'name')
scatte­rplot
pd.plo­tti­ng.s­ca­tte­r_m­atr­ix(df)
multiple scatte­rplots
pd.plo­tti­ng.p­ar­all­el_­coo­rdi­nat­es(df, 'name')
lines drawn connecting dimensions of an entry
df['co­l_n­ame­'].u­nique
returns list of singled entries
pd.get­_du­mmi­es(df, column­s=[­'Na­me'])
dummie column (0 or 1) that indicates whether the entry in another column is a certain entry
np.ran­dom.ch­oice(n, x, replac­e=f­alse)
selects random set
np.set­dif­f1d­(set_1, set_2)
New set with only the differing entries
df.to_­numpy()
gives array of entries
 

Working with a Dataframe

df['col1'] == x
bool if entry is x
df[df == x] = y
replace all values of a kind

Label-­based indexing with .loc / .iloc

df.loc[rowindex, columnname]
df.loc[3, col1]
df.loc[3:6, ['col1', 'col2']]
3rd entry of 1st column
df.loc[:, 'col1' ] == 'name'
column with t/f whether entry in col1 is name
 
df.ilo­c[3:-1, 2:]
[rows, columns]
df.iloc[:, [3, 1]]
columns with index 3 & 1
.loc is label based, .iloc is integer index based

Series

s1 = pd.Ser­ies([1, 2, 3], index=­['a', 'b', 'c'])
creates a pandas series
s1.add(s2, fill_v­alue=0)
s.isnull() ; s.notn­ull()
s.dropna()
drops all rows with missing values
s.fill­na(x)
s = pd.Dat­aFr­ame­({'­Siz­e':s1, 'Weigh­t':s2})
Best way to define dataframe out of series:
Give dict out of columns
'e' in s1
returns bool
s.name = 'str'
names series
s.inde­x.name = 'str'
names index
If s doesn't exist, this creates a df
s.colu­mns­['Red', 'Green']
s.columns.name = 'Color'
s.rein­dex­[('m', 'n', 'o'], method = 'ffill')
ffill = forward fill