Show Menu

Clean ML code and PEP8 guidelines Cheat Sheet by

Guidelines from the book "Clean Machine Learning Code" by Moussa Taifi and from PEP8

Optimizing names

Reveal an intention.
Key concepts names should commun­icate:
- Why it exists?
- What does it do?
- How is it used?
Differ­ences between variables should be as close to the beginning of the name as possible.
Avoid noisy labels. Use mature optional typing system.
Make Siri say it. Avoid abbrev­iation and make your names self-e­xpl­ana­tory.
Adding the datatype to a variable should be replaced with typing inform­ation. features: DataFrame vs. featur­es_df
No magic numbers. Use constants.
Be consis­tent! Pick a single word per concept, and use it everywhere it fits in.
Use technical names for backend, and domain names as you get closer the customer.

Optimize functions

Small is beautiful. 3, 5, maybe 5 lines max!
Stop at maximum 3 arguments!
Avoid boolean arguments. This points to the fact that the function does more than one thing.
Lengthy list of config­uration arguments should be grouped in a config­uration object that share a concept.
Comman Query Separation (CQS)
- Command: a function is changing some external “state”.
- Query: a function is returning some “infor­mat­ion”.
Avoid side effects in feature engine­ering pipelines.
Make temporal couplings explicit.
DRY: Don't Repeat Yourself Eliminate Duplic­ates, Doubles, and Homolo­gues.

Naming conven­tions

Use verbs to represent actions. Use is_for functions that return a boolean.
get_fe­atures, fit, is_com­pleted
See scope length guidel­ines.
x, var, my_var­­iable
Use nouns to represent objects.
Model, DataLoader
Lowercase word(s). Separate with unders­core.
class_­­me­thod, method
Uppercase single letter or word(s). Separate with unders­core.
Short lowercase word(s). Separate with unders­core., my_mod­­
Short lowercase word(s). Do NOT separate with unders­core.
package, mypackage

The Scope Length Guidelines

How to avoid side effects in DataFrames

Copy any data coming in the function and return a fresh copy, after all the modifi­cat­ions.
Return only the transf­ormed columns as a separated objects.
Append­-only inside the functions.

Handling Exceptions

Promote the Happy Path with Except­ions.
Separate the Happy Path from the Outliers.
Don’t reuse unrelated Exceptions types.

Progra­mming recomm­end­ations

Don’t compare Boolean values to True or False using the equiva­lence operator.
Use the fact that empty sequences are falsy in if statem­ents.
Use is not rather than not ... is in if statem­ents.
Don’t use if x: when you mean if x is not None:.
Use .start­swith() and .endsw­ith() instead of slicing strings.

Blank lines

Surround top-level functions and classes with two blank lines.
Surround method defini­tions inside classes with a single blank line.
Use blank lines sparingly inside functions to show clear steps.
Formulas always break before binary operat­ions.

Whites­paces in Expres­sions and Statements

Assignment operators (=, +=, -=, and so forth).
Exception: when = is used to assign a default value to a function argument, do not surround it with spaces.
Compar­isons (==, !=, >, <. >=, <=) and (is, is not, in, not in).
It is better to only add whitespace around the operators with the lowest priority, especially when performing mathem­atical manipu­lation.
Booleans (and, not, or).


Spaces are the preferred indent­ation method.
Tabs should be used solely to remain consistent with code that is already indented with tabs.
Dictio­naries should use the default formatting rules without indent­ation. Don't try to keep the values aligned.


Don’t Hide Bad Code Behind Comments. Let Code Explain Itself!
Limit the line length of comments and docstrings to 72 charac­ters.
Use complete sentences, starting with a capital letter.
Make sure to update comments if you change your code.
Indent block comments to the same level as the code they describe.
Start each line with a # followed by a single space.
Separate paragraphs by a line containing a single #.
Write inline comments on the same line as the statement they refer to.
Separate inline comments by two or more spaces from the statement.
Don’t use them to explain the obvious.

Tips and tricks

Use an IDE or linters, programs that analyze code and flag errors (e.g pycode­style, flake8)
You can use autofo­rma­tters that refactor your code to conform with PEP 8 automa­tically (e.g. black, autopep8, yapf).

Maximum Line Length

PEP 8 suggests lines should be limited to 79 charac­ters.
Wrap long lines by using Python's implied line contin­uation inside parent­heses, brackets and braces.


No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

          python regex(regular expression) Cheat Sheet