| Session
 Data wrangling
 | Livy2.pyspark
 Combining data
 in pyspark you need to start by setting an alias for the tables that you want to join  df1 = TableA.alias('df1') df2 = TableB.alias('df2') Useful links: http://www.learnbymarketing.com/1100/pyspark-joins-by-example/ http://www.learnbymarketing.com/618/pyspark-rdd-basics-examples/ | 
            
                Cheatography
                https://cheatography.com
            
        
        
    
                   
                            DL Cheat Sheet (DRAFT) by woobidoobi
This is a draft cheat sheet. It is a work in progress and is not finished yet.
                    
        
                
        
            

