Python - Pandas Cheat Sheet by aggialavura - Download free from Cheatography - Cheatography.com: Cheat Sheets For Every Occasion

TO START

import numpy as np
import pandas as pd

SERIES (similar to numpy array)

pd.Series(data = list)	create series from list
pd.Series(data=list, index=labels)	create series with index
pd.Series(np_arr)	create series from numpy array
pd.Series(np_arr, labels)	create series with index
pd.Series(dict)	create series from dictionary
pd.Series[num]	indexing
ser1 + ser2	sum two Series

Pandas series differs from numpy arrays because series can have axis labels, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object, also functions (although unlikely used).

Note: the terms "data=" and "index=" can be omitted.

DATAFRAMES and INDEXING

df = pd.DataFrame() *	create dataframe
df['col'] *	select col
df.loc['row']	select row
df.iloc['row']	select a row by its index
df.col	select a column (opt.2 - avoid)
df[['col1','col2']] *	take two columns
type(df['col'])	column type
df['new_col'] = [1,2,3]	insert column
df.drop('row',axis=0)*	drop row
df.drop('col',axis=1)	drop column
df.drop('col',axis=1, inplace=True)*	permanent drop
df.loc['row1','col1']	select a row and a column
df.loc[['r1', 'r2'],['c1','c2']]*	select 2 rows and 2 columns
df>condition	return boolean
df[df>cond]	return values
df[df['col']>0]	return rows of col that satisfy condition
df[df['col1']>0]['col2']	return col2 that satisfy cond. on col1
df[df['c1']>0][['c2','c3']]	return c2 & c3 that satisfy cond. on col1
df[(cond1) & (cond2)]	return values that satisfy cond1 & cond2
df[(cond1) \| (cond2)]	return values that satisfy cond1 \| cond2
df.reset_index()	add num index
df['new_col'] = 'NY LA' .split()	add col quickly
df.set_index('col')	set a column as index
df.set_index('col', inplace=True)	make it permanent
!!THERE IS ALSO MULTI-INDEXING"

DataFrame function take a data (the values), index (the name of the index column), columns (the name of the column) parameters.

Columns are series.

take two columns: note the double brackets [[ ]]

axis=0 can be omitted, is the default value.

inplace=True will apply the result to the original dataframe. Without it, you are not changing the dataframe.

r = row.
c = column.

MISSING DATA

df.isnull()	check for na
df.dropna()	drop all rows with at least 1 na
df.dropna(axis=1)	drop all cols with at least 1 na
df.dropna(thresh=n)	keep with at least n value/s
df.fillna(value='value')	replace na
df['col'].fillna(value= df['col'].mean())	replace using funct

GROUPBY

df.groupby('col')	group rows by a col
grouped_df.count()	use cnt function
grouped_df.mean()	use mean function
grouped_df.std()	use std function
grouped_df.min()	use min function
grouped_df.max()	use max function
grouped_df.describe()	df descriptives
grouped_df('col') .count().loc['row]	apply function and take a row
... .transpose()	rotate results
... .transpose()['row']	rotate and take a row

MERGING, JOINING, CONCATENATING

pd.concat([df1,df2,df3])	concatenate dfs
pd.concat([...],axis=1)	concatenate by col
pd.merge()*	merge two dfs
df1.join(df2)	join two dfs

pd.merge() takes "df1", "df2", "how=", "on=" parameters. "how=" can be "inner"/"outer"/"left"/"right", "on=" has to be a column/s key.

join() is similar to merge but works on indexes that can be different. It also cn take the "how=" argument.

METHODS and FUNCTIONS

df['c'].unique()	return unique values
df['c'].nunique()	count unique val
df['c'].value_counts()	count how many of same values
df['c1'].apply(func)	apply func to df
df['c1'].apply(len)	apply len
df['c1'].apply(sum)	apply sum
... .apply(lambda x:x+2)	apply lambda
df.index	return idx names
df.info()	return df info
df.describe()	return df stats
df.columns	return col names
del df['col']*	delete col from df
df.sort_values(by='col')	sort df values
df.pivot_table()*	create a pivot tbl

del differs from ".drop()" because it will permanently remove a column from the df.

pivot_table() takes "values=", "index=", "columns=" parameters. It reads: "Create a table from df, with values of colx, index of colx2, and divided by values in colx3"

INPUT and OUPUT code to start

# to import HTML tables
conda install lxml
conda install html5lib
conda install BeautifulSoup4

# to use SQL
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:') 
df.to_sql('data', engine)
sql_df = pd.read_sql('data',con=engine)

INPUT and OUPUT operations

pwd	ask nb route
df = pd.read_csv('example')	read csv
df = pd.read_excel ('name',sheet_name='name')	read excel
df = pd.read_html('address')	read html
df.to_csv('str',index=False)	save as csv
df.to_excel('name', sheet_name='name', index = False)	save as xlsx

Python - Pandas Cheat Sheet (DRAFT) by aggialavura

TO START

SERIES (similar to numpy array)

DATAFRAMES and INDEXING

MISSING DATA

GROUPBY

MERGING, JOINING, CONCATENATING

METHODS and FUNCTIONS

INPUT and OUPUT code to start

INPUT and OUPUT operations

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

Python - Pandas Cheat Sheet (DRAFT) by aggialavura

TO START

SERIES (similar to numpy array)

DATAFRAMES and INDEXING

MISSING DATA

GROUPBY

MERGING, JOINING, CONCAT­ENATING

METHODS and FUNCTIONS

INPUT and OUPUT code to start

INPUT and OUPUT operations

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

MERGING, JOINING, CONCATENATING