install and import
installing pandas pip install pandas
|
pip install pandas import pandas as pd
|
Reading and describing
pd -> pandas df-> dataframe |
to read a file into a dataframe df= pd.read_csv('filename')
|
look at the first 5 lines df.head()
|
to describe df df.describe()
|
|
to print all the column names telecom_data.columns
|
to get the dimension of df df.shape
|
Sorting and filtering
sort sorting can be done column wise - default is ascending df.sort_values(by='Total day charge')
|
|
Sort values by col1 in ascending order (use ascending =False for descending sort) |
df.sort_values([col1,col2],ascending=[True,False])
|
Sort values by col1 in ascending order then col2 in descending order |
Filtering df[condition] #eg: df[df['col']>5]
|
|
Rows where the column col is greater than 0.5 |
df[(df[col] > 0.5) & (df[col] < 0.7)]
|
Rows where 0.7 > col > 0.5 |
Inplace
NOTE |
df.merge(df2)
gives you a copy of df merged with df2. you may save it to a new variable. ex df3=df.merge(df2)
|
if you want to merge df2 to df right away use inplace. df.merge(df2,inplace=True)
|
|
|
Rows and columns
to delete a row - [axis=0 means rows] new_df = df.drop([2,3],axis = 0) #this drops the row with index 2,3
|
to delete a column- [axis=1 means columns] new_df = df.drop(['col1','col2'],axis = 0) #this drops the column with name col1 and col2
|
Df manipulation
create or edit a new column df['new_colname'] = 5 #this creates a new new column with all values as 5
|
create a new column df['new_colname'] = [list of values] #this creates a new column with list of values assigned to each corresponding row
|
NOTE : df['new_colname'] = [list of values]
throws an error if the no of items in [list of values] doesn't match no of rows |
create or edit a new row |
df.loc[index_of_row] = [list of items]
|
NOTE : df.loc[index_of_row] = [list of items]
throws an error if the no of items doesn't match no of rows |
Selection
|
Returns column with label col as Series |
|
Returns multiple columns as a new DataFrame |
Country Capital Population 1 Belgium Brussels 11190846 2 India New Delhi 1303171035 3 Brazil Brasilia 207847528 |
df.iloc([0], [0])
--> 'Belgium' | s.iloc[0] | Selection by position (0th position on row and column) |
df.loc([0], ['Country'])
--> 'Belgium' |
df.ix[2]
--> Country Brazil Capital Brasilia Population 207847528 |
df.ix[1, 'Capital']
--> 'New Delhi' |
df.iloc[0,:] | select First row |
|
|
Data Cleaning
df.set_index('column_one') |
Change the index with a new column |
df.columns = ['new_col_name1','new_col_name2','new_col_name3'] |
Rename columns |
pd.isnull() |
Checks for null Values, Returns Boolean Arrray |
pd.notnull() |
Opposite of pd.isnull() |
df.dropna() |
Drop all rows that contain null values |
df.dropna(axis=1) |
Drop all columns that contain null values |
df.dropna(axis=1,thresh=n) |
Drop all rows have have less than n non null |
df.fillna(x) |
Replace all null values with x |
JOIN/COMBINE
|
Adds the rows in df1 to the end of df2 (columns should be identical) |
pd.concat([df1, df2],axis=1)
|
Adds the columns in df1 to the end of df2 (rows should be identical) |
df1.join(df2,on=col1,how='inner')
|
joins the columns in df1 with the columns on df2 where the rows for col have identical values. how can be one of 'left', 'right', 'outer', 'inner' |
left = takes the index of left df right =takes the index of left outer = union of both keys inner = intersection of both keys |
|
Created By
Metadata
Favourited By
Comments
No comments yet. Add yours below!
Add a Comment
Related Cheat Sheets