Show Menu
Cheatography

exam 3 Cheat Sheet (DRAFT) by

JK;LGAKJD;KLJAJKDLGAS;LKAGK;JD

This is a draft cheat sheet. It is a work in progress and is not finished yet.

14

Define a function =

def hello():

print( ‘Hello World’ )

Then invoke it with = hello()

Define a function with a PARAMETER (argument)

Def welcome(name):

Print ( f ‘ Hello, {name} ‘ )

Invoke it with = welcome( ‘Amy’ )

Two required Parameters:

Def welcome_greeting(name, greeting_text):

print( f ‘ Hey, {name}. {greeting_text} ‘)

Invoke it with = welcome_greeting( ‘Liz’ , ‘How are you?’ ) - these are known as KWARG

Define a method to do a calculation

Def exponent(base, exponent):

Power = base ** exponent

Return power

Num1 = 2

Num2 = 3

Answer = exponent(num1, num2)

print(answer)

print( exponent(2,3))

Def sum_of_numbers( *parameters):

Total = 0

For each_number in parameters:

Total = total + each_number

Return total

Sum = sum_of_numbers(1,2,3,4,5,6)

print(sum

16

Df = pd.read_csv(url)

print(df.to_string())

Null is null, a null is something = a special creation to indicate the absence of a value - its a made up value

Df.shape = not a method does not need ()

Look at only one column = df[‘School Name’]

print( df[ ‘School Name’ ].to_string( ))

Find the unique names= df[‘School Name’].unique()

type(unique_schools) shows the type.. This is not a data frame

Statistics:

df[‘Starting Salary’].max() or df[‘Starting Salary’].mean() or df[‘Starting Salary’].min()

Find the NAs = df[‘Starting Salary’].isna() then to count the trues = na_rows.sum()

Based on a condition

Df2 = df.query(“ 
Starting Salary
> 75000 “)

20

Change individual values

Df.loc[20, ‘Starting Salary’] = ‘ ‘

Convert to numeric type = df[‘Starting Salary’] = pd.to_numeric( df[‘Starting Salary’] )

Df.loc[139, ‘Starting Salary’] = 46000

#Convert starting salary to numeric FORCE CONVERT or “COERCE” conversion

Error_columns = pd.to_numeric( df[‘Starting Salary’], errors= ‘coerce’)

print(error_columns)

#find the NAs

Nas = error_columns.isna()

print(Nas)

Df[20:25]

#fix columns

Df.loc[70, ‘Starting Salary’] = 42600

df[Nas]

Save it to the original by overwriting

df[‘Starting Salary’] = pd.to_numeric(df[‘Starting Salary’])
 

15

Import pandas as pd

Data_list = [45, 74, 78]

Series_of_numbers = pd.series(data_list)

print(series_of_numbers[1])

Years = [2021, 2022, 2023]

Create series with labels and use KWARG

Series_of_numbers = pd.series(data=data_list, index=years)

print(series_of_numbers)

Show me the value for 2021

print(series_of_numbers[2021])

Create a series with integrated data labels

Grade_distribution = {‘A’ : 34, ‘B’ : 56}

Convert the dictionary ^ to a series = grade_series = pd.series(data=grade_distribution)

print(grade_series) or print(grade_series[‘A’])

2 dimensional data - in multiple lists

Quiz_scores = {

‘Quiz1’ : [32, 56, 56] ,

‘Quiz2’ : [78, 34, 32]}

Df = pd.DataFrame(data=quiz_scores)

print(df)

Overwrite the df like this:

Df = pd.DataFrame(data=quiz_scores, index=[‘Mike’ , ‘Susan’, ‘Amy’]

df.head() = top 5 rows df.tail() = bottom 5 rows df[40:60] = select row

18

Find all the schools with the name Pitt

Df2 = df.query( “ 
School Name
== ‘Pitt’ “) df2.head() Remove a column: df.drop(columns=’Starting Salary’, inplace = True) Or df = df.drop(columns=’Starting Salary’) Drop a row df.drop(index=2, inplace = True) Delete entire row of data when one column had missing data df=df.dropna()

19

Load descriptives for the df = df.describe()

Load tab-delimitted file

Df2 = pd.read_csv(URL, sep=’\t’ )

Replace function:

Df[ ‘School Name’].replace(‘-’, ‘ -’, regex=True, inplace=True)

Fillnas = df[‘Starting Salary’].fillna(0, inplace=True)

How many unique school names are there:

len( df[‘School Name’].unique())

Show only the rows in which df are duplicate:

Duplicates = df.duplicated(subset= ‘School Name’)

Boolean series = df[duplicates]

Df2 = df.drop_duplicates(subset=’School Name’, keep=’first’)

Find out schools with specified

PA_schools = df2[‘School Name’].str.contains(‘Pennsylvania’)

Use a boolean series df2[PA_schools]

Overwrite instead on inplace

Df2 = df2.sort_values(‘Starting Salary’, ascending = False)

Fix one bad value:

Df2.loc[2, ‘Starting Salary’] = df2[‘Starting Salary’].mean()

19

Load descriptives for the df = df.describe()

Load tab-delimitted file

Df2 = pd.read_csv(URL, sep=’\t’ )

Replace function:

Df[ ‘School Name’].replace(‘-’, ‘ -’, regex=True, inplace=True)

Fillnas = df[‘Starting Salary’].fillna(0, inplace=True)

How many unique school names are there:

len( df[‘School Name’].unique())

Show only the rows in which df are duplicate:

Duplicates = df.duplicated(subset= ‘School Name’)

Boolean series = df[duplicates]

Df2 = df.drop_duplicates(subset=’School Name’, keep=’first’)

Find out schools with specified

PA_schools = df2[‘School Name’].str.contains(‘Pennsylvania’)

Use a boolean series df2[PA_schools]

Overwrite instead on inplace

Df2 = df2.sort_values(‘Starting Salary’, ascending = False)

Fix one bad value:

Df2.loc[2, ‘Starting Salary’] = df2[‘Starting Salary’].mean()