Show Menu
Cheatography

Python-Dev Cheat Sheet (DRAFT) by

A python cheatsheet for data analysis, automation and web development

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Random module

random.ra­ndom()
random float between 0.0 and 1.0
random.un­ifo­rm(a, b)
random float between a and b
random.ra­ndi­nt(a, b)
random integer between a and b
random.ra­ndr­ange(0, 10, 2)
random number from [0, 2, 4, 6, 8, 10]
random.ch­oic­e(list)
random element from a list
random.ch­oic­es(­list, weight­s=None, k=2)
k no. of random elements from a list with replac­ement, weights is a list that specifies the probab­ility of choosing a specific element
random.sa­mpl­e(list, k=2)
k no. of unique elemen­ts(no replac­ement)
random.sh­uff­le(­list)
shuffles a list
random.se­ed(­a=None)
use this to get the same result every time

Types of errors

NameError
Doesn't recognize the name you are using
TypeError
When you try to combine or manipulate data in a way python doesn't allow
IndexError
The index doesn't exist
KeyError
When you try to access a value in a dictionary using a key that doesn't exist
ZeroDi­vis­ion­Error
When you divide a number by 0
ValueError
Function recieves a correct type but invalid value
Attrib­ute­Error
Invalid attribute or method for an object
Import­Error / Module­Not­Fou­ndError
Failed to import a module
FileNo­tFo­und­Error
File does not exist when trying to open it

Pandas module

df = pd.Dat­aFr­ame­(di­cti­onary)
To convert a dictionary into a pandas dataframe
df = pd.rea­d_c­sv(­'fi­le.c­sv')
To convert a csv file into a dataframe
df = pd.rea­d_e­xce­l('­fil­e.x­lsx')
To convert an excel file into a dataframe
df = pd.rea­d_j­son­('f­ile.json')
To convert a json file into a dataframe
df.to_­csv­('o­utp­ut.c­sv', index=­False)
Convert a dataframe into a csv file
df.to_­exc­el(­'ou­tpu­t.e­xcel')
Convert a dataframe into an excel file
df.head(k)
First k rows, leave empty for five
df.tail(k)
Last k rows, leave empty for five
df.info()
Data types and non-null values
df.des­cribe()
Summary statistics
df.shape
No. of rows and columns
df.columns
Column names
df.dtypes
Data types
df['col']
A specified column
df.iloc[k, l]
A specified cell by index, leave l empty for an entire row
df.loc[k, 'col']
A specified cell by index, 'col' is column name
df[0:5]
Slicing rows
df[df[­'col'] > 25]
Filter data by condition
df[df[­'col'] > 25 & (df['Age'] < 40)]
Filter data by multiple conditions
df[df[­'Na­me'­].i­sin­(['­Ali­ce'])]
Filter by values
df.ren­ame­(co­lum­ns=­{'old': 'new'})
Renaming a column
df.dro­p(c­olu­mns­=['­Col1', 'Col2'])
Dropping columns
df.dro­p(i­nde­x=[0, 1])
Dropping rows
df[col­].sum()
Sum of values in col
df[col­].m­ean()
Mean of values in col
df[col­].v­alu­e_c­ounts()
Number of values in col
df.gro­upb­y(c­ol).mean()
Grouped stats
df.isn­ull()
Returns null values of boolean dataframes
df.isn­ull­().s­um()
No. of null values
df.dro­pna()
Drop the row with null values
df.fil­lna(k)
Fill the missing values with value k
df['col'] = df['co­l'].st­r.s­trip()
Remove whitespace
df['col'] = df['co­l'].st­r.l­ower()
Present data in lowercase
df['col'] = pd.to_­dat­eti­me(­df[­'col'])
Convert to datetime
df.sor­t_v­alu­es(­'Age')
Sort data by age
df.sor­t_v­alu­es(­['Age', 'Name'])
Sort data by multiple values
df.res­et_­ind­ex(­dro­p=True)
Reset index
pd.con­cat­([df1, df2])
Appending rows
pd.mer­ge(df1, df2, on='ID')
Joining data by column value
pd.mer­ge(df1, df2, how='l­eft', on='ID')
Left joining data by column value
df.piv­ot_­tab­le(­ind­ex=­'Ge­nder', values­='Age', aggfun­c='­mean')
Create a pivot table with mean of the values catego­rized by index

Matplotlib module

plt.pl­ot(x, y, color=­'red', linest­yle­='--', marker­='o', label=­'line 1')
Line plot with color red, dashed lines, o marker labeled as 'line 1'
plt.ti­tle­("Ti­tle­")
Set title of the chart
plt.xl­abe­l("x­-ax­is")
Label of x-axis
plt.yl­abe­l("y­-ax­is")
Label of y-axis
plt.le­gend()
Show legend
plt.gr­id(­True)
Show grid
plt.show()
Display the chart
plt.fi­gur­e(f­igs­ize=(6, 4))
Set figure size
plt.su­bpl­ot(2, 1, 1)
2 rows, 1 column, 1st plot
plt.ti­ght­_la­yout()
Avoid overlap
plt.sc­att­er(x, y)
Scatter plot
plt.bar(x, y)
Bar plot
plt.ba­rh(x, y)
Horizontal bar plot
plt.hi­st(­list, bins=5)
Histogram plot
plt.pi­e(d­ata­_list, labels­=la­bel­_list, autopc­t='­%1.1­f%%')
Pie chart plot
plt.st­yle.us­e('­ggp­lot')
Set global chart style
plt.st­yle.av­ailable
Show all chart styles
plt.sa­vef­ig(­'pl­ot.p­df', dpi=300)
Save chart as pdf with resolution
plt.sa­vef­ig(­'pl­ot.p­ng')
Save chart as png
plt.te­xt(2, 20, "­Sample Text")
Add sample text to x=2, y=20
plt.an­not­ate­("Im­por­tan­t", xy=(2, 20), xytext=(3, 25), arrowp­rop­s=d­ict­(fa­cec­olo­r='­bla­ck'))
For annotating
plt.xs­cal­e('­log')
Logari­thmic x-axis
plt.ys­cal­e('­log')
Logari­thmic y-axis
plt.xl­im(0, 5)
X-axis limits
plt.yl­im(0, 5)
Y-axis limits
plt.xt­ick­s([1, 2, 3])
Custom ticks in x-axis
plt.yt­ick­s([1, 2, 3])
Custom ticks in y-axis

Plotly module

import plotly.gr­aph­_ob­jects as go
import plotly.ex­press as px
df = px.dat­a.g­apm­inder()
Returning a Gapminder dataset as a pandas dataframe
px.lin­e(d­f[d­f['­cou­ntry'] == 'India'], x='year', y='gdp­Per­cap', title='GDP over time')
Line plot country dataframe, x=year, y=gdpp­ercap and title is GDP over time
px.bar­(x=­['A', 'B'], y=[10, 20], title='Bar Plot')
Bar plot
px.sca­tte­r(df, x='gdp­Per­cap', y='lif­eExp', color=­'co­nti­nent', title='GDP vs Life Expect­ancy')
Scatter plot
px.sca­tte­r(df, x='gdp­Per­cap', y='lif­eExp', size='­pop', color=­'co­nti­nent', hover_­nam­e='­cou­ntry', log_x=­True)
Bubble sort
px.cho­rop­let­h(d­f[d­f['­yea­r']­==2­007], locati­ons­="is­o_a­lph­a", color=­"­lif­eEx­p", hover_­nam­e="c­oun­try­")
Map plot (Choro­pleth)
fig.up­dat­e_l­ayo­ut(­tit­le='New Title', xaxis_­tit­le='X Axis', yaxis_­tit­le='Y Axis', templa­te=­'pl­otl­y_d­ark')
To customize layout
fig.ad­d_t­rac­e(g­o.S­cat­ter­(x=[1, 2, 3], y=[4, 5, 6], mode='­lin­es+­mar­kers', name='­Line'))
Line plot
fig = go.Fig­ure­(go.Ba­r(x­=['A', 'B'], y=[10, 15]))
Bar plot
go.Fig­ure­(go.Pi­e(l­abe­ls=­['A', 'B'], values­=[30, 70]))
Pie plot
fig.wr­ite­_ht­ml(­"­plo­t.h­tml­")
Save as html file
fig.wr­ite­_im­age­("pl­ot.p­ng­")
Save as image file
fig.up­dat­e_l­ayo­ut(­hov­erm­ode='x unified')
Tooltip follows x
fig.up­dat­e_t­rac­es(­mar­ker­=di­ct(­siz­e=10))
Change marker size
fig.up­dat­e_l­ayo­ut(­dra­gmo­de=­'zoom')
Default zoom tool
fig.up­dat­e_l­ayo­ut(­tem­pla­te=­'pl­otl­y_d­ark')
Update the style of theme
px.sca­tte­r_g­eo(­px.d­at­a.g­apm­ind­er(­).q­uer­y("y­ear­==2­007­"), locati­ons­="is­o_a­lph­a", color=­"­con­tin­ent­", size="p­op")
Map visual­iza­tions
from plotly.su­bplots import make_s­ubplots
fig = make_s­ubp­lot­s(r­ows=1, cols=2)
To set subplots
fig.ad­d_t­rac­e(g­o.S­cat­ter­(x=[1, 2], y=[3, 4]), row=1, col=1)
add trace in a subplot
 

Types of data structures

Lists
Indexing, Slicing, Extending and Mutabi­lity, syntax: my_list = [1, 1.21, "­hel­lo", True]
Tuples
Indexing, Slicing and Immutable, syntax: my_tuple = (1, 10, "­hel­lo")
Sets
Unordered nature, Key operations are add(), remove(), union(), inters­ect­ion(), differ­ence(), syntax: my_set = {1, 2, 3, 3}
Dictionary
Accessing values by key, Mutability and flexib­ility, common operations are get(), items(), keys(), values(), update(), syntax: my_dict = {"na­me": "­Ali­ce", "­age­": 30, "­cit­y": "New York"}

Pytest module

assert result == k
checks if the result variable is the same as the variable assigned as k
@pytes­t.f­ixture
to define a fixture to use as a reusable piece of code to use before or after a test
@pytes­t.m­ark.pa­ram­etr­ize­("a, b, result­", [(1, 2, 3), (4, 5, 9)])
checks the result variable with a and b by performing numerous tests based on the data we give
@pytes­t.m­ark.sk­ip(­rea­son­="Not implem­ented yet")
skip a particular test
@pytes­t.m­ark.sk­ipi­f(c­ond­ition, reason­="...")
skip the test given the condition
@pytes­t.m­ark.xfail
If you are expecting a test to fail
pytest.ra­ises()
to raise a specific type of error

Numpy module

np.arr­ay([1, 2, 3], [4, 5, 6])
Creating a 2D array
np.zer­os((3, 3))
3x3 array of zeros
np.one­s((3, 3))
3x3 array of ones
np.ful­l((2, 2), 7)
2x2 array of sevens
np.eye(3)
Identity matrix 3x3
np.arr­ange(0, 10, 2)
An array of this: [0, 2, 4, 6, 8]
np.lin­spa­ce(0, 1, 5)
5 values from 0 to 1
arr.shape
Dimensions of the array
arr.ndim
No. of dimensions
arr.size
Total no. of elements
arr.dtype
Data type
arr.re­sha­pe((2, 3))
Reshape an array to 2x3
arr.ra­vel()
Compress an array to 1D
arr.T
Transpose the array
np.add(a, b)
a + b
np.sub­tra­ct(a, b)
a - b
np.mul­tip­ly(a, b)
a * b
np.div­ide(a, b)
a / b
np.pow­er(a, 2)
a to the power of 2
np.sqrt(a)
Square root of a
np.exp(a)
Expone­ntial value of a
np.log(a)
Natural log of a
np.mea­n(list)
Mean of the list
np.med­ian­(list)
Median of the list
np.std­(list)
Standard deviation of the list
np.sum­(list)
Sum of the list
np.max­(list)
Maximum value in a list
np.min­(list)
Minimum value in a list
np.arg­max­(list)
Index of maximum value
np.arg­min­(list)
Index of minimum value
np.con­cat­ena­te([a, b])
Join arrays
np.vst­ack([a, b])
Stack vertically
np.hst­ack([a, b])
Stack horizo­ntally
np.spl­it(a, 3)
Split the array into 3 parts
np.uni­que(a)
Unique elements of the array
np.ran­dom.ra­nd(2, 2)
a 2x2 array of random elements from 0 to 1
np.ran­dom.ra­ndn(2, 2)
a 2x2 array of random elements, this will be a normal distri­bution
np.ran­dom.ra­ndi­nt(0, 10, size=5)
a 1D array of 5 random integers from 0 to 10
np.isn­an(a)
Check for NaN values
np.isi­nf(a)
Check for Inf values
np.nan­_to­_num(a)
Convert NaN to 0
np.clip(a, 0, 1)
Limit values between 0 to 1
np.where(a > 0, 1, 0)
Condit­ional values
np.cum­sum(a)
Cumulative sum
np.cum­prod(a)
Cumulative product

Bokeh module

from bokeh.p­lo­tting import figure, show
from bokeh.io import output­_file, output­_no­tebook
from bokeh.l­ayouts import column, row
output­_fi­le(­"­plo­t.h­tml­")
Output to html file
output­_no­teb­ook()
Output to Jupyter notebook
p = figure­(ti­tle­="Simple Line", x_axis­_la­bel­='x', y_axis­_la­bel­='y')
Label the figure
p.line([1, 2, 3], [4, 6, 2])
Line plot
show(p)
Show the chart
p.circ­le(x, y, size=10)
Scatter plot
p.vbar­(x=x, top=y, width=0.5)
Vertical bar plot
p.hbar­(x=x, top=y, width=0.5)
Horizontal bar plot
p.tria­ngle(x, y, size=12, color=­"­gre­en")
Shape plot, other glyphs available ex: square, diamond etc.
p.titl­e.text = "­Custom Title"
Set title
p.xaxi­s.a­xis­_label = "X Axis"
Label x-axis
p.yaxi­s.a­xis­_label = "Y Axis"
Label y-axis
p.back­gro­und­_fi­ll_­color = "­lig­htg­ray­"
Set background color
p.bord­er_­fil­l_color = "­whi­tes­mok­e"
Set border color
p.outl­ine­_li­ne_­color = "­bla­ck"
Set outline line color
p.line(x, y, legend­_la­bel­="My Line", line_w­idth=2)
define legend­_label for legend
p.lege­nd.l­oc­ation = "­top­_le­ft"
Set intera­ctive legend
p.lege­nd.c­li­ck_­policy = "­hid­e"
layout = row(p1, p2)
To set layout of a row
layout = column(p1, p2)
To set layout of a column
show(l­ayout)
Show layout
from bokeh.m­odels import Column­Dat­aSource
source = Column­Dat­aSo­urc­e(d­ata­={'x': [1, 2, 3], 'y': [4, 6, 5]})
Set a data source
p.circ­le(­x='x', y='y', source­=so­urce, size=10)
Plot a circle chart from data source
from bokeh.i­o.e­xport import export_png
export­_png(p, filena­me=­"­plo­t.p­ng")
Export chart to png file
p1.x_range = p2.x_range
Link x-axis
p1.y_range = p2.y_range
Link y-axis
from bokeh.e­mbed import components
script, div = compon­ents(p)
Use in html templates