Show Menu
Cheatography

SQL Cheatsheet for Data Analytics Cheat Sheet (DRAFT) by

This cheatsheet for Data Analytics by @Sourabh Sahu

This is a draft cheat sheet. It is a work in progress and is not finished yet.

RDBMS Softwares

Oracle
MYSQL
MS SQL Server
MariaDB

Follow @Sourabh Sahu

 

QUERYING DATA

Fetch all columns from the products table: SELECT * FROM products;
Fetch product IDs and names from the inventory table: SELECT produc­t_id, produc­t_name FROM inventory;
Fetch employee names sorted by their salary in descending order: SELECT employ­ee_name FROM employees ORDER BY salary DESC;
Fetch customer names sorted by their regist­ration date in ascending order (default): SELECT custom­er_name FROM customers ORDER BY regist­rat­ion­_date;

ALIASING (AS)

SELECT name AS produc­t_name FROM products;

INDEXING

Indexing improves data retrieval speed by creating a reference system. It enhances query perfor­mance but may slightly impact insert­/update operat­ions.
Clustered Index: Dictates physical data order, often used for primary keys. Non-Cl­ustered Index: Creates separate index struct­ures, allowing multiple indexes for specific columns, without affecting data storage order.

BETWEEN AND IN

IN checks if a column value matches any value in a list, while BETWEEN checks if a column value falls within a specified range.
SELECT * FROM product WHERE category IN ('Elec­tro­nics', 'Cloth­ing', 'Furni­ture');
SELECT * FROM product WHERE price BETWEEN 20 AND 50;

CONSTR­AINTS

Constr­aints are rules applied to database tables to ensure data accuracy, integrity, and consis­tency.
PRIMARY KEY enforces unique, non-null values in the "­pro­duc­t_i­d" column.
FOREIGN KEY links the "­man­ufa­ctu­rer­_id­" column to a related table.
NOT NULL ensures the "­pro­duc­t_n­ame­" must have a value.
CHECK enforces a condition, like ensuring "­pri­ce" is positive.
DEFAULT assigns a default value to a column when no value is provided during insertion.
CREATE TABLE product ( product_id INT PRIMARY KEY, produc­t_name VARCHA­R(255) NOT NULL, price DECIMA­L(10, 2) CHECK (price > 0), manufa­ctu­rer_id INT, FOREIGN KEY (manuf­act­ure­r_id) REFERENCES manufa­ctu­rer­(ma­nuf­act­ure­r_id), discount DECIMAL(5, 2) DEFAULT 0.00 -- Default discount set to 0.00 );

SUBQUERIES

Scalar Subquery: Returns a single value and can be used within SELECT, WHERE, or FROM clauses.
SELECT produc­t_name, (SELECT MAX(price) FROM products) AS max_price FROM products;
Single-Row Subquery: Subquery that returns a single row of results, typically used in comparison operat­ions.
SELECT produc­t_name FROM products WHERE price = (SELECT MAX(price) FROM products);
Multi-Row Subquery: Subquery that returns multiple rows, often used with IN, ANY, or ALL operators.
SELECT produc­t_name FROM products WHERE manufa­ctu­rer_id IN (SELECT manufa­ctu­rer_id FROM manufa­cturers WHERE country = 'USA');
Correlated Subquery: References values from the outer query within the subquery.
SELECT produc­t_name FROM products p WHERE price > (SELECT AVG(price) FROM products WHERE manufa­ctu­rer_id = p.manu­fac­tur­er_id);
Inline View Subquery: Creates a temporary table-like structure for complex subqueries within the FROM clause.
SELECT AVG(su­bqu­ery.av­g_p­rice) FROM (SELECT manufa­ctu­rer_id, AVG(price) AS avg_price FROM products GROUP BY manufa­ctu­rer_id) AS subquery;
 

SQL and NO SQL

Uses a struct­ured, tabular data model with predefined schemas.
Offers various flexible data models, suitable for semi-s­tru­ctured or unstru­ctured data.
Primarily designed for vertical scaling, can be complex to scale horizo­ntally
Designed for horizontal scalab­ility, making it easier to handle high traffic and large datasets.

WINDOW AND RANKING FUNCTIONS

Window functions allow you to perform calcul­ations across a set of rows related to the current row within the result set. Rank functions assign a rank or row number to each row based on specified criteria.
SELECT produc­t_name, price, ROW_NU­MBER() OVER (ORDER BY price) AS row_num, DENSE_­RANK() OVER (ORDER BY price) AS dense_­rank, SUM(price) OVER (PARTITION BY manufa­ctu­rer_id) AS manufa­ctu­rer­_total FROM products;
we use window functions like ROW_NU­MBER, DENSE_­RANK, and SUM with the "­pro­duc­ts" table to assign row numbers, dense ranks, and calculate the total price for each manufa­ctu­rer's products.
RANK assigns the same rank to rows with equal values, leaving gaps, while DENSE_RANK assigns consec­utive ranks without gaps, and ROW_NUMBER assigns a unique row number to each row.

SET OPERATION

INTERSECT: Returns the common rows between two SELECT statem­ents. Retrieves products that exist in both sets of results.
SELECT produc­t_id, produc­t_name FROM products INTERSECT SELECT produc­t_id, produc­t_name FROM some_o­the­r_t­able;
UNION: Combines the results of two SELECT statem­ents, removing duplic­ates. Retrieves all unique products from both sets of results.
SELECT produc­t_id, produc­t_name FROM products UNION SELECT produc­t_id, produc­t_name FROM some_o­the­r_t­able;
UNION ALL: Similar to UNION, but includes all rows, including duplic­ates. Retrieves all products from both sets of results, allowing duplic­ates.
SELECT produc­t_id, produc­t_name FROM products UNION ALL SELECT produc­t_id, produc­t_name FROM some_o­the­r_t­able;
EXCEPT: Returns rows that exist in the first SELECT statement but not in the second. Retrieves products that are in the "­pro­duc­ts" table but not in "­som­e_o­the­r_t­abl­e."
SELECT produc­t_id, produc­t_name FROM products EXCEPT SELECT produc­t_id, produc­t_name FROM some_o­the­r_t­able;

IS NULL / IS NOT NULL

IS NULL: Filters rows where the "­man­ufa­ctu­rer­_id­" is not assigned. SELECT * FROM product WHERE manufa­ctu­rer_id IS NULL;
IS NOT NULL: Filters rows where the "­man­ufa­ctu­rer­_id­" is assigned. SELECT * FROM product WHERE manufa­ctu­rer_id IS NOT NULL;

FILTERING

Retrieve all products with a price greater than $50: SELECT * FROM product WHERE price > 50;
Get products with a quantity in stock less than or equal to 10: SELECT * FROM product WHERE stock_­qua­ntity <= 10;
Retrieve products with a specific category (e.g., "­Ele­ctr­oni­cs"): SELECT * FROM product WHERE category = 'Elect­ron­ics';
Find products with names containing the word "­pho­ne": SELECT * FROM product WHERE produc­t_name LIKE '%phone%';
Get products added after a certain date (e.g., '2023-­01-­01'): SELECT * FROM product WHERE date_added > '2023-­01-01';
Retrieve products with a price between $20 and $30: SELECT * FROM product WHERE price BETWEEN 20 AND 30;
Find products with low stock and a price greater than $50: SELECT * FROM product WHERE stock_­qua­ntity < 5 AND price > 50;
Retrieve products from a specific manufa­cturer (e.g., "­Sam­sun­g"): SELECT * FROM product WHERE manufa­cturer = 'Samsung';
 

JOINS

INNER JOIN: Returns only the rows with matching values in both tables.
SELECT * FROM sales INNER JOIN products ON sales.p­ro­duct_id = produc­ts.p­ro­duc­t_id;
LEFT JOIN (or LEFT OUTER JOIN): Returns all sales records and their corres­ponding products. If a product has no sales, it still appears with NULL values in sales-­related columns.
SELECT * FROM products LEFT JOIN sales ON produc­ts.p­ro­duct_id = sales.p­ro­duc­t_id;
RIGHT JOIN (or RIGHT OUTER JOIN): Opposite of the LEFT JOIN. Returns all sales records and includes products that have no sales
SELECT * FROM sales RIGHT JOIN products ON sales.p­ro­duct_id = produc­ts.p­ro­duc­t_id;
FULL OUTER JOIN: Returns all sales records and all products, including those without sales. NULL values appear where there is no match.
SELECT * FROM products FULL OUTER JOIN sales ON produc­ts.p­ro­duct_id = sales.p­ro­duc­t_id;
SELF JOIN: A self join could be used if the "­sal­es" table contains inform­ation about salesp­eople who sell products. You might join the table with itself to identify salesp­eople who have sold the same products.
SELECT s1.sal­esp­ers­on_­name, s2.sal­esp­ers­on_name FROM sales AS s1 JOIN sales AS s2 ON s1.pro­duct_id = s2.pro­duct_id WHERE s1.sal­esp­ers­on_name <> s2.sal­esp­ers­on_­name;

AGGREG­ATION AND GROUPING

Aggreg­ation functions like COUNT, SUM, AVG, MAX, and MIN are used to perform calcul­ations on data. GROUP BY clause is used to group rows based on one or more columns. Aggreg­ation functions are applied to each group, providing summary inform­ation.
-- Count the total number of products SELECT COUNT(*) AS total_­pro­ducts FROM products;
-- Calculate the total price of all products SELECT SUM(price) AS total_­price FROM products;
-- Calculate the average price of products SELECT AVG(price) AS averag­e_price FROM products;
-- Find the highest product price SELECT MAX(price) AS highes­t_price FROM products;
-- Find the lowest product price SELECT MIN(price) AS lowest­_price FROM products;

DIMENS­IONAL MODELLING

A design approach for data wareho­using.
Organizes data into fact tables (quant­itative data) and dimension tables (descr­iptive data).
Simplifies complex queries, improves perfor­mance, and supports business analytics by creating a logical structure for data analysis.

PERFOR­MANCE OPTIMI­ZATION

Indexing: Use approp­riate indexes.
Optimize Queries: Write efficient SQL.
Limit Data Retrieval: Fetch only needed data.
Normal­iza­tion: Properly structure tables.
Stored Proced­ures: Use precom­piled proced­ures.
Table Partit­ioning: Divide large tables.
Regular Mainte­nance: Rebuild indexes, update stats.
Optimize Hardware: Ensure server resources.
Caching: Implement caching mechan­isms.
Use Proper Data Types: Choose suitable types.
Concur­rency Control: Manage transa­ctions carefully.

Data Types

CHAR(N): Fixed-­length character string with a specified maximum length of N.
VARCHA­R(N): Variab­le-­length character string with a maximum length of N.
VARCHAR and VARCHAR2 are used interc­han­geably in most databases, but VARCHAR2 is specific to Oracle databases.
INTEGE­R,F­LOA­T,B­OOLEAN
BLOB: Stores binary large objects, such as images or files.
DATETIME: Combines date and time in 'YYYY-­MM-DD HH:MM:SS' format.
DECIMAL(P, S): Fixed-­point decimal number with P total digits and S decimal places.
DATE: Stores a date value in the format 'YYYY-­MM-DD'.
TIME: Represents a time of day in 'HH:MM:SS' format.