pandas groupby Cheat Sheet

Pandas objects

Series	a one-dimensional labeled array that can hold any data type
DataFrame	a two-dimensional labeled data structure with columns of potentially different types
Index	a sequence of axis labels that can be used to identify rows or columns in a DataFrame
MultiIndex	a hierarchical index object that allows for more than one index level in a DataFrame.
Timestamp	a specific moment in time, represented in Pandas as a datetime object.
Period	a specific interval of time, represented in Pandas as a period object.
DatetimeIndex	an index of datetime objects, used to index Pandas objects like Series and DataFrame.
Timedelta	a duration of time, represented in Pandas as a timedelta object.
Categorical	a data type used to represent categorical variables,
Sparse	a data structure used to represent sparse data efficiently,
Interval	a data type used to represent intervals
DatetimeTZ	a datetime object with a timezone.

Pandas functions called on pd

pd.read_csv()	used to read data from a CSV file and create a DataFrame.
pd.DataFrame()	a constructor function that is used to create a new DataFrame from data in memory.
pd.Series():	used to create a new Series object.
pd.concat()	used to concatenate two or more DataFrames or Series
pd.merge():	used to merge two DataFrames based on a common column.
pd.groupby()	used to group data in a DataFrame based on one or more columns.
pd.pivot_table():	used to create a pivot table from a DataFrame.
pd.to_datetime():	used to convert a column of strings to datetime objects.
pd.to_numeric()	used to convert a column of strings to numeric objects.
pd.cut()	This function is used to bin data into discrete intervals.
pd.qcut():	This function is used to bin data into quantiles.
pd.date_range()	used to create a range of dates or DatetimeIndex.
pd.timedelta()	used to create a timedelta object representing a duration of time.

String vectorization

str.replace():	used to replace a pattern in a string with another pattern
str.extract()	used to extract a pattern from a string and return the first match.
str.split()	used to split a string into a list of strings based on a delimiter.
str.join()	used to concatenate a list of strings with a separator string
str.startswith()	used to check if a string starts with a particular substring
str.endswith()	used to check if a string ends with a particular substring
str.lower(), str.upper():	used to convert the case of a string to lowercase or uppercase, respectively
str.strip():	used to remove leading and trailing whitespace from a string
str.contains()	used to check if a pattern exists in a string

Date Vectorization

pd.to_datetime():	used to convert a column of strings or Unix timestamps to a datetime object.
dt.date	used to extract the date component of a datetime object or DatetimeIndex.
dt.time	extract the time component of a datetime object or DatetimeIndex.
dt.hour, dt.minute, dt.second	used to extract the hour, minute, and second components of a datetime object or DatetimeIndex.
dt.dayofweek:	used to extract the day of the week as an integer,
dt.day_name()	used to extract the name of the day of the week
dt.month_name()	used to extract the name of the month
dt.tz_localize() and dt.tz_convert():	used to set or convert the timezone of a datetime object or DatetimeIndex.
dt.is_month_start and dt.is_month_end	used to check if a datetime object or DatetimeIndex is at the start or end of a month, respectively.

pandas groupby

Consider it as DataFrame; you can use all DataFrame attributes and functions on the group object.

Pandas pivot_table

data	The DataFrame or Series to be used for the pivot table.
values	The column to aggregate. If not specified, all numerical columns will be used.
index	The column(s) to use as row labels.
columns	The column(s) to use as column labels.
aggfunc	The function to use for aggregating the values. Defaults to 'mean'.
margins	Add all row/columns (e.g. 'All') label and compute grand total for that row/column.
margins_name	The label for the row/column that contains the grand total. Defaults to 'All'.
dropna	Whether or not to exclude rows with missing values. Defaults to True.
sort	Whether or not to sort the rows by the values of the index.

Pandas multiIndex

# Create a MultiIndex from two separate indexes
index1 = pd.Index(['A', 'B', 'C'], name='Index1')
index2 = pd.Index(['X', 'Y', 'Z'], name='Index2')
multi_index = pd.MultiIndex.from_product([index1, index2], names=['Index1', 'Index2'])

# Create a MultiIndex from a list of tuples
tuples = [('A', 'X'), ('A', 'Y'), ('B', 'X'), ('B', 'Y')]
multi_index = pd.MultiIndex.from_tuples(tuples, names=['Index1', 'Index2'])

# Select data from a single level of the index
df.loc['A']  # selects all rows where Index1 = 'A'
df.loc[('A', 'X')]  # selects a single row where Index1 = 'A' and Index2 = 'X'

# Select data from multiple levels of the index
# selects rows where Index1 is 'A' or 'B' and Index2 is 'X', and returns the 'Column1' values
df.loc[(['A', 'B'], 'X'), 'Column1'] 

# Stack a DataFrame to move a level of the MultiIndex to the columns
df.stack()

# Unstack a DataFrame to move a level of the columns to the index
df.unstack()

pandas groupby Cheat Sheet (DRAFT) by Daryabi

Pandas objects

Pandas functions called on pd

String vectorization

Date Vectorization

pandas groupby

Pandas pivot_table

Pandas multiIndex

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

pandas groupby Cheat Sheet (DRAFT) by Daryabi

Pandas objects

Pandas functions called on pd

String vector­ization

Date Vector­ization

pandas groupby

Pandas pivot_­table

Pandas multiIndex

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

String vectorization

Date Vectorization

Pandas pivot_table