14
Define a function =
def hello():
print( ‘Hello World’ )
Then invoke it with = hello()
Define a function with a PARAMETER (argument)
Def welcome(name):
Print ( f ‘ Hello, {name} ‘ )
Invoke it with = welcome( ‘Amy’ )
Two required Parameters:
Def welcome_greeting(name, greeting_text):
print( f ‘ Hey, {name}. {greeting_text} ‘)
Invoke it with = welcome_greeting( ‘Liz’ , ‘How are you?’ ) - these are known as KWARG
Define a method to do a calculation
Def exponent(base, exponent):
Power = base ** exponent
Return power
Num1 = 2
Num2 = 3
Answer = exponent(num1, num2)
print(answer)
print( exponent(2,3))
Def sum_of_numbers( *parameters):
Total = 0
For each_number in parameters:
Total = total + each_number
Return total
Sum = sum_of_numbers(1,2,3,4,5,6)
print(sum
|
16
Df = pd.read_csv(url)
print(df.to_string())
Null is null, a null is something = a special creation to indicate the absence of a value - its a made up value
Df.shape = not a method does not need ()
Look at only one column = df[‘School Name’]
print( df[ ‘School Name’ ].to_string( ))
Find the unique names= df[‘School Name’].unique()
type(unique_schools) shows the type.. This is not a data frame
Statistics:
df[‘Starting Salary’].max() or df[‘Starting Salary’].mean() or df[‘Starting Salary’].min()
Find the NAs = df[‘Starting Salary’].isna() then to count the trues = na_rows.sum()
Based on a condition
Df2 = df.query(“ Starting Salary > 75000 “)
|
20
Change individual values
Df.loc[20, ‘Starting Salary’] = ‘ ‘
Convert to numeric type = df[‘Starting Salary’] = pd.to_numeric( df[‘Starting Salary’] )
Df.loc[139, ‘Starting Salary’] = 46000
#Convert starting salary to numeric FORCE CONVERT or “COERCE” conversion
Error_columns = pd.to_numeric( df[‘Starting Salary’], errors= ‘coerce’)
print(error_columns)
#find the NAs
Nas = error_columns.isna()
print(Nas)
Df[20:25]
#fix columns
Df.loc[70, ‘Starting Salary’] = 42600
df[Nas]
Save it to the original by overwriting
df[‘Starting Salary’] = pd.to_numeric(df[‘Starting Salary’])
|
|
|
15
Import pandas as pd
Data_list = [45, 74, 78]
Series_of_numbers = pd.series(data_list)
print(series_of_numbers[1])
Years = [2021, 2022, 2023]
Create series with labels and use KWARG
Series_of_numbers = pd.series(data=data_list, index=years)
print(series_of_numbers)
Show me the value for 2021
print(series_of_numbers[2021])
Create a series with integrated data labels
Grade_distribution = {‘A’ : 34, ‘B’ : 56}
Convert the dictionary ^ to a series = grade_series = pd.series(data=grade_distribution)
print(grade_series) or print(grade_series[‘A’])
2 dimensional data - in multiple lists
Quiz_scores = {
‘Quiz1’ : [32, 56, 56] ,
‘Quiz2’ : [78, 34, 32]}
Df = pd.DataFrame(data=quiz_scores)
print(df)
Overwrite the df like this:
Df = pd.DataFrame(data=quiz_scores, index=[‘Mike’ , ‘Susan’, ‘Amy’]
df.head() = top 5 rows df.tail() = bottom 5 rows df[40:60] = select row
|
18
Find all the schools with the name Pitt
Df2 = df.query( “ School Name == ‘Pitt’ “)
df2.head()
Remove a column: df.drop(columns=’Starting Salary’, inplace = True)
Or df = df.drop(columns=’Starting Salary’)
Drop a row df.drop(index=2, inplace = True)
Delete entire row of data when one column had missing data df=df.dropna()
|
19
Load descriptives for the df = df.describe()
Load tab-delimitted file
Df2 = pd.read_csv(URL, sep=’\t’ )
Replace function:
Df[ ‘School Name’].replace(‘-’, ‘ -’, regex=True, inplace=True)
Fillnas = df[‘Starting Salary’].fillna(0, inplace=True)
How many unique school names are there:
len( df[‘School Name’].unique())
Show only the rows in which df are duplicate:
Duplicates = df.duplicated(subset= ‘School Name’)
Boolean series = df[duplicates]
Df2 = df.drop_duplicates(subset=’School Name’, keep=’first’)
Find out schools with specified
PA_schools = df2[‘School Name’].str.contains(‘Pennsylvania’)
Use a boolean series df2[PA_schools]
Overwrite instead on inplace
Df2 = df2.sort_values(‘Starting Salary’, ascending = False)
Fix one bad value:
Df2.loc[2, ‘Starting Salary’] = df2[‘Starting Salary’].mean()
|
19
Load descriptives for the df = df.describe()
Load tab-delimitted file
Df2 = pd.read_csv(URL, sep=’\t’ )
Replace function:
Df[ ‘School Name’].replace(‘-’, ‘ -’, regex=True, inplace=True)
Fillnas = df[‘Starting Salary’].fillna(0, inplace=True)
How many unique school names are there:
len( df[‘School Name’].unique())
Show only the rows in which df are duplicate:
Duplicates = df.duplicated(subset= ‘School Name’)
Boolean series = df[duplicates]
Df2 = df.drop_duplicates(subset=’School Name’, keep=’first’)
Find out schools with specified
PA_schools = df2[‘School Name’].str.contains(‘Pennsylvania’)
Use a boolean series df2[PA_schools]
Overwrite instead on inplace
Df2 = df2.sort_values(‘Starting Salary’, ascending = False)
Fix one bad value:
Df2.loc[2, ‘Starting Salary’] = df2[‘Starting Salary’].mean()
|
|