Python Pandas Cheat Sheet

Basics

%matplotlib inline	plots into notebook
df = pd.read_csv(path, index_col ='name')	loads dataframe
df.head()
df.tail()
df.values
df.plot() df.plot(style='.')
df.index	returns row indexes
df[col].loc[index]	returns value with given column and index
df.loc[:, 'col_n+1'] = x	referring to a col that doesn't exist creates a new one

STOPPED FILLING AT LECTURE 6 LINE 122

Basic Dataframe Analysis

df.isnull()	returns bool
df[.isnull().sum	returns sum of trues
df.isnull().any	checks whether there is a true
`df[col].` max() `df[col].` min()
`df[col].` idxmax() `df[col].` idxmin()
`df[col].` median()
`df[col].` mean()
`df[col].` describe()	gives statistic analysis
`df[col].` quantile(.5)	50% quantile
df.boxplot(by = 'col')	boxplot grouped by column
df.hist(bins = 20)	histogram in 20 bars
df.plot.scatter(x = 'name', y = 'name')	scatterplot
pd.plotting.scatter_matrix(df)	multiple scatterplots
pd.plotting.parallel_coordinates(df, 'name')	lines drawn connecting dimensions of an entry
df['col_name'].unique	returns list of singled entries
pd.get_dummies(df, columns=['Name'])	dummie column (0 or 1) that indicates whether the entry in another column is a certain entry
np.random.choice(n, x, replace=false)	selects random set
np.setdiff1d(set_1, set_2)	New set with only the differing entries
df.to_numpy()	gives array of entries

Working with a Dataframe

df['col1'] == x	bool if entry is x
df[df == x] = y	replace all values of a kind

Label-based indexing with .loc / .iloc

df.loc[rowindex, columnname]
df.loc[3, col1] df.loc[3:6, ['col1', 'col2']]	3rd entry of 1st column
df.loc[:, 'col1' ] == 'name'	column with t/f whether entry in col1 is name

df.iloc[3:-1, 2:]	[rows, columns]
df.iloc[:, [3, 1]]	columns with index 3 & 1

.loc

is label based,

.iloc

is integer index based

Series

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])	creates a pandas series
s1.add(s2, fill_value=0)
s.isnull() ; s.notnull()
s.dropna()	drops all rows with missing values
s.fillna(x)
s = pd.DataFrame({'Size':s1, 'Weight':s2})	Best way to define dataframe out of series: Give dict out of columns
'e' in s1	returns bool
s.name = 'str'	names series
s.index.name = 'str'	names index If s doesn't exist, this creates a df
s.columns['Red', 'Green'] s.columns.name = 'Color'
s.reindex[('m', 'n', 'o'], method = 'ffill')	ffill = forward fill

Download the Python Pandas Cheat Sheet

1 Page

Latest Cheat Sheet

3 Pages

(0)

Git/Github Cheat Sheet

This Git Cheatsheet is a quick reference guide to essential Git commands used in version control. It provides concise syntax and examples to help developers manage their code efficiently across local and remote repositories.

musmankkh

25 Jul 25

devtools, versioncontrol, gitcommands

Recent Cheat Sheet Activity

Python Pandas Cheat Sheet (DRAFT) by Lasse1618

Basics

STOPPED FILLING AT LECTURE 6 LINE 122

Basic Dataframe Analysis

Working with a Dataframe

Label-based indexing with .loc / .iloc

Series

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

Python Pandas Cheat Sheet (DRAFT) by Lasse1618

Basics

STOPPED FILLING AT LECTURE 6 LINE 122

Basic Dataframe Analysis

Working with a Dataframe

Label-­based indexing with .loc / .iloc

Series

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

Label-based indexing with .loc / .iloc