Show Menu
Cheatography

PYTHON PANDAS Cheat Sheet by

install and import

installing pandas
pip install pandas
pip install pandas
import pandas as pd

Reading and describing

pd -> pandas
df-> dataframe
to read a file into a dataframe
df= pd.rea­d_c­sv(­'fi­len­ame')
look at the first 5 lines
df.he­ad()
to describe df
df.de­scr­ibe()
df.in­fo()
to print all the column names
telecom_data.columns
to get the dimension of df
df.shape

Sorting and filtering

sort
sorting can be done column wise - default is ascending
df.so­rt_­val­ues­(by­='Total day charge')
df.so­rt_­val­ues­(col1)
Sort values by col1 in ascending order (use ascending =False for descending sort)
df.so­rt_­val­ues­([c­ol1­,co­l2]­,as­cen­din­g=[­Tru­e,F­alse])
Sort values by col1 in ascending order then col2 in descending order
Filtering
df[co­ndi­tion]  ­ ­  #eg: df[df[­'co­l']­>5]
df[df­['col'] > 0.5]
Rows where the column col is greater than 0.5
df[(d­f[col] > 0.5) & (df[col] < 0.7)]
Rows where 0.7 > col > 0.5

Inplace

NOTE
df.me­rge­(df2) gives you a copy of df merged with df2. you may save it to a new variable. ex df3=d­f.m­erg­e(df2)
if you want to merge df2 to df right away use inplace. df.me­rge­(df­2,i­npl­ace­=True)
 

Rows and columns

to delete a row - [axis=0 means rows]
new_df = df.dro­p([­2,3­],axis = 0) #this drops the row with index 2,3
to delete a column- [axis=1 means columns]
new_df = df.dro­p([­'co­l1'­,'c­ol2­'],axis = 0) #this drops the column with name col1 and col2

Df manipu­lation

create or edit a new column
df['n­ew_­col­name'] = 5   #this creates a new new column with all values as 5
create a new column
df['n­ew_­col­name'] = [list of values]   #this creates a new column with list of values assigned to each corres­ponding row
NOTE : df['n­ew_­col­name'] = [list of values] throws an error if the no of items in [list of values] doesn't match no of rows
create or edit a new row
df.lo­c[i­nde­x_o­f_row] = [list of items]
NOTE : df.lo­c[i­nde­x_o­f_row] = [list of items] throws an error if the no of items doesn't match no of rows

Selection

df[col]
Returns column with label col as Series
df[[col1, col2]]
Returns multiple columns as a new DataFrame
 ­  Country  ­ ­Capital  ­ ­ ­ ­ ­ ­Pop­ulation
1 Belgiu­m   Brussels  ­ ­ ­ ­111­90846
2 India  ­ ­ ­ ­ ­ ­ New Delhi  ­130­3171035
3 Brazil  ­ ­ ­ ­ ­ ­Bra­silia  ­ ­ ­ ­ ­ ­207­847528
df.il­oc([0], [0]) --> 'Belgium'  | s.iloc[0] | Selection by position (0th position on row and column)
df.lo­c([0], ['Coun­try']) --> 'Belgium'
df.ix[2] -->
Country  ­ ­ ­ ­ ­  Brazil
Capital         Brasilia
Population   207847528
df.ix[1, 'Capit­al'] --> 'New Delhi'
df.ilo­c[0,:] | select First row
 

Data Cleaning

df.set­_in­dex­('c­olu­mn_­one')
Change the index with a new column
df.columns = ['new_­col­_na­me1­','­new­_co­l_n­ame­2',­'ne­w_c­ol_­name3']
Rename columns
pd.isn­ull()
Checks for null Values, Returns Boolean Arrray
pd.not­null()
Opposite of pd.isn­ull()
df.dro­pna()
Drop all rows that contain null values
df.dro­pna­(ax­is=1)
Drop all columns that contain null values
df.dro­pna­(ax­is=­1,t­hre­sh=n)
Drop all rows have have less than n non null
df.fil­lna(x)
Replace all null values with x

JOIN/C­OMBINE

df1.a­ppe­nd(­df2)
Adds the rows in df1 to the end of
df2 (columns should
be identical)
pd.co­nca­t([df1, df2],a­xis=1)
Adds the columns in
df1 to the end
of df2 (rows should be
identical)
df1.j­oin­(df­2,o­n=c­ol1­,ho­w='­inn­er')
joins the columns in df1 with the columns on
df2 where the rows
for col have identical
values. how can be
one of 'left',
'right', 'outer', 'inner'
left = takes the index of left df
right =takes the index of left
outer = union of both keys
inner = inters­ection of both keys
   
 

Comments

No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets