Spark SQL Joins Cheat Sheet

Spark Joins that have a SQL Equivalent

Basic Syntax: 
df1.join(df2, df1("common_col_from1") === df2("common_col_from2"), "method").show()/.show(false)

Self:
 Though there is no self-join type available, we can use any of the above-explained join types to join DataFrame to itself.

 println("self join")
  empDF.as("emp1").join(empDF.as("emp2"),
    col("emp1.sup_emp_id") === col("emp2.emp_id"),"inner")
    .select(col("emp1.emp_id"),col("emp1.name"),
      col("emp2.emp_id").as("sup_emp_id"),
      col("emp2.name").as("sup_emp_name"))
      .show(false)
Inner:
  empDF.join(deptDF,empDF("emp_dept_id") ===  deptDF("dept_id"),"inner")
    .show(false)

Full outer:
    empDF.join(deptDF,empDF("emp_dept_id") ===  deptDF("dept_id"),"full")
    .show(false)

Left Outer:
empDF.join(deptDF,empDF("emp_dept_id") ===  deptDF("dept_id"),"left")
    .show(false)

Spark Joins that are not in SQL!

Left semi join 
Returns all rows from the left DF on records match in the right dataset on join expression, records not matched on join expression are ignored from both left and right datasets.
 
empDF.join(deptDF,empDF("emp_dept_id") ===  deptDF("dept_id"),"leftsemi")
    .show(false)

Left anti join 
leftanti join returns only columns from the left DataFrame/Dataset for non-matched records.
empDF.join(deptDF,empDF("emp_dept_id") ===  deptDF("dept_id"),"leftanti")
    .show(false)

Cross Join
use the CROSS JOIN syntax to allow cartesian products between these
relations, or: enable implicit cartesian products by setting the configuration
variable spark.sql.crossJoin.enabled=true;

 println("cross join")
  empDF.join(deptDF,empDF("emp_dept_id") ===  deptDF("dept_id"),"cross")
    .show(false)

  println("Using crossJoin()")
  empDF.crossJoin(deptDF).show(false)

Download the Spark SQL Joins Cheat Sheet

3 Pages

Latest Cheat Sheet

1 Page

(0)

AI Font Generator & Typography Tools Cheat Sheet

A quick reference for designers, founders, and creators using AI font generators and typography tools. Covers font formats, typography terms, AI font creation workflows, and useful tools for generating downloadable TTF and WOFF2 fonts.

hardwellee

30 Jun 26

design, font

Spark SQL Joins Cheat Sheet (DRAFT) by datamansam

Spark Joins that have a SQL Equivalent

Spark Joins that are not in SQL!

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker