Session
Data wrangling
|
Livy2.pyspark
Combining data
in pyspark you need to start by setting an alias for the tables that you want to join df1 = TableA.alias('df1') df2 = TableB.alias('df2') Useful links: http://www.learnbymarketing.com/1100/pyspark-joins-by-example/ http://www.learnbymarketing.com/618/pyspark-rdd-basics-examples/ |
Cheatography
https://cheatography.com
DL Cheat Sheet (DRAFT) by woobidoobi
This is a draft cheat sheet. It is a work in progress and is not finished yet.