exam 3 Cheat Sheet

14

Define a function =

def hello():

print( ‘Hello World’ )

Then invoke it with = hello()

Define a function with a PARAMETER (argument)

Def welcome(name):

Print ( f ‘ Hello, {name} ‘ )

Invoke it with = welcome( ‘Amy’ )

Two required Parameters:

Def welcome_greeting(name, greeting_text):

print( f ‘ Hey, {name}. {greeting_text} ‘)

Invoke it with = welcome_greeting( ‘Liz’ , ‘How are you?’ ) - these are known as KWARG

Define a method to do a calculation

Def exponent(base, exponent):

Power = base ** exponent

Return power

Num1 = 2

Num2 = 3

Answer = exponent(num1, num2)

print(answer)

print( exponent(2,3))

Def sum_of_numbers( *parameters):

Total = 0

For each_number in parameters:

Total = total + each_number

Return total

Sum = sum_of_numbers(1,2,3,4,5,6)

print(sum

16

Df = pd.read_csv(url)

print(df.to_string())

Null is null, a null is something = a special creation to indicate the absence of a value - its a made up value

Df.shape = not a method does not need ()

Look at only one column = df[‘School Name’]

print( df[ ‘School Name’ ].to_string( ))

Find the unique names= df[‘School Name’].unique()

type(unique_schools) shows the type.. This is not a data frame

Statistics:

df[‘Starting Salary’].max() or df[‘Starting Salary’].mean() or df[‘Starting Salary’].min()

Find the NAs = df[‘Starting Salary’].isna() then to count the trues = na_rows.sum()

Based on a condition

Df2 = df.query(“ Starting Salary
 > 75000 “)

20

Change individual values

Df.loc[20, ‘Starting Salary’] = ‘ ‘

Convert to numeric type = df[‘Starting Salary’] = pd.to_numeric( df[‘Starting Salary’] )

Df.loc[139, ‘Starting Salary’] = 46000

#Convert starting salary to numeric FORCE CONVERT or “COERCE” conversion

Error_columns = pd.to_numeric( df[‘Starting Salary’], errors= ‘coerce’)

print(error_columns)

#find the NAs

Nas = error_columns.isna()

print(Nas)

Df[20:25]

#fix columns

Df.loc[70, ‘Starting Salary’] = 42600

df[Nas]

Save it to the original by overwriting

df[‘Starting Salary’] = pd.to_numeric(df[‘Starting Salary’])

15

Import pandas as pd

Data_list = [45, 74, 78]

Series_of_numbers = pd.series(data_list)

print(series_of_numbers[1])

Years = [2021, 2022, 2023]

Create series with labels and use KWARG

Series_of_numbers = pd.series(data=data_list, index=years)

print(series_of_numbers)

Show me the value for 2021

print(series_of_numbers[2021])

Create a series with integrated data labels

Grade_distribution = {‘A’ : 34, ‘B’ : 56}

Convert the dictionary ^ to a series = grade_series = pd.series(data=grade_distribution)

print(grade_series) or print(grade_series[‘A’])

2 dimensional data - in multiple lists

Quiz_scores = {

‘Quiz1’ : [32, 56, 56] ,

‘Quiz2’ : [78, 34, 32]}

Df = pd.DataFrame(data=quiz_scores)

print(df)

Overwrite the df like this:

Df = pd.DataFrame(data=quiz_scores, index=[‘Mike’ , ‘Susan’, ‘Amy’]

df.head() = top 5 rows df.tail() = bottom 5 rows df[40:60] = select row

18

Find all the schools with the name Pitt

Df2 = df.query( “ School Name == ‘Pitt’ “)

df2.head()

Remove a column: df.drop(columns=’Starting Salary’, inplace = True)

Or df = df.drop(columns=’Starting Salary’)

Drop a row df.drop(index=2, inplace = True)

Delete entire row of data when one column had missing data df=df.dropna()

19

Load descriptives for the df = df.describe()

Load tab-delimitted file

Df2 = pd.read_csv(URL, sep=’\t’ )

Replace function:

Df[ ‘School Name’].replace(‘-’, ‘ -’, regex=True, inplace=True)

Fillnas = df[‘Starting Salary’].fillna(0, inplace=True)

How many unique school names are there:

len( df[‘School Name’].unique())

Show only the rows in which df are duplicate:

Duplicates = df.duplicated(subset= ‘School Name’)

Boolean series = df[duplicates]

Df2 = df.drop_duplicates(subset=’School Name’, keep=’first’)

Find out schools with specified

PA_schools = df2[‘School Name’].str.contains(‘Pennsylvania’)

Use a boolean series df2[PA_schools]

Overwrite instead on inplace

Df2 = df2.sort_values(‘Starting Salary’, ascending = False)

Fix one bad value:

Df2.loc[2, ‘Starting Salary’] = df2[‘Starting Salary’].mean()

19

Load descriptives for the df = df.describe()

Load tab-delimitted file

Df2 = pd.read_csv(URL, sep=’\t’ )

Replace function:

Df[ ‘School Name’].replace(‘-’, ‘ -’, regex=True, inplace=True)

Fillnas = df[‘Starting Salary’].fillna(0, inplace=True)

How many unique school names are there:

len( df[‘School Name’].unique())

Show only the rows in which df are duplicate:

Duplicates = df.duplicated(subset= ‘School Name’)

Boolean series = df[duplicates]

Df2 = df.drop_duplicates(subset=’School Name’, keep=’first’)

Find out schools with specified

PA_schools = df2[‘School Name’].str.contains(‘Pennsylvania’)

Use a boolean series df2[PA_schools]

Overwrite instead on inplace

Df2 = df2.sort_values(‘Starting Salary’, ascending = False)

Fix one bad value:

Df2.loc[2, ‘Starting Salary’] = df2[‘Starting Salary’].mean()

exam 3 Cheat Sheet (DRAFT) by rjyurk100

14

16

20

15

18

19

19

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker