Show Menu
Cheatography

Jupyter Cheat Sheet (DRAFT) by

Basic code to explore/analyse datasets

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Explore and pre-pr­ocess Data

Exploring Data:

# Get basic information about the DataFrame
df.info()

# Summary statistics for numerical columns
df.describe()

# Number of unique values in each column
df.nunique()

# Count missing values in each column
df.isnull().sum()

# Remove duplicate rows
df = df.drop_duplicates()

# Drop columns with missing values
df = df.dropna(axis=1)

# Fill missing values with a specific value
df['column_name'].fillna(value, inplace=True)

# Drop rows with missing values
df = df.dropna()

# Replace values in a column
df['column_name'].replace({old_value: new_value}, inplace=True)

# Convert data types
df['column_name'] = df['column_name'].astype('new_data_type')

# Rename columns
df.rename(columns={'old_column_name': 'new_column_name'}, inplace=True)

# Filter rows based on a condition
filtered_df = df[df['column_name'] > value]

# Multiple conditions
filtered_df = df[(df['column1'] > value1) & (df['column2'] < value2)]

# Select specific columns
selected_columns_df = df[['column1', 'column2']]

# Sorting the DataFrame
df.sort_values(by='column_name', ascending=False, inplace=True)

# Create a new column based on existing columns
df['new_column'] = df['column1'] + df['column2']

# Apply a function to a column
df['new_column'] = df['existing_column'].apply(lambda x: your_function(x))

Analyse

# Plotting histograms
# Box plot
# Scatter plot
# line graph
df['co­lum­n_n­ame­'].h­ist()
sns.bo­xpl­ot(­x='­col­umn1', y='col­umn2', data=df)
plt.sc­att­er(­df[­'co­lum­n1'], df['co­lum­n2']) plt.xl­abe­l('­Col­umn1') plt.yl­abe­l('­Col­umn2') plt.ti­tle­('S­catter Plot') plt.show()
plt.pl­ot(­df[­'x'], df['y']) plt.ti­tle­('S­ample Line Graph') plt.xl­abe­l('­X-axis label') plt.yl­abe­l('­Y-axis label') plt.show()

First

# import all needed libraries
# Load data from a CSV file
# Display the first few rows of the DataFrame
# Get basic inform­ation about the DataFrame
# Summary statistics for numerical columns
import pandas as pd import matplo­tli­b.p­yplot as plt import seaborn as sns
df = pd.rea­d_c­sv(­'yo­ur_­fil­e.csv')
df.head()
df.info()
df.des­cribe()