Tom Wojcik personal blog

Pre-commit hooks

☕️ 6 min read

How PEP8, linters and formatters work together

In 2001 Python style guide, PEP8 has been introduced. Nowadays developers use a few tools to follow PEP8 directives.

Linters are one of these. They allow running a basic quality control against your code but also offer more than checking basic syntax rules. For instance, with flake8 linter you can use --max-complexity flag which will make sure that your function is not too complex (McCabe complexity). Linters let you know where the issue is and that’s all - you have to fix them yourself.

Formatters, from the other hand, do the “easy” stuff for you. If your line is too long, it will break the line. Even though formatters do some changes in the code for you, they guarantee your code behavior won’t be changed as in Python world they check that processing a file has not changed its AST.

Both allow you to follow PEP8 principles. The first one allows you to check what rules have been broken (and more) while the latter one does the (formatting) changes for you.

Newcomers tend to write in the style they’re familiar with from other languages, which often is camelCase or PascalCase. Even though in Python world absolutely no one uses these coding styles, nothing stops you from actually doing it.

Flake8 linter

There’s flake8 package that combines a few tools to enforce PEP8 in your codebase.

pip install flake8

Running flake8 with default configuration against this code

from math import floor

def foo(myNotSoShortVariableName: int): return myNotSoShortVariableName / 2 if myNotSoShortVariableName % 2 == 0 else myNotSoShortVariableName // 2


if __name__ == '__main__':
    foo(4.2)
flake8 example.py

results in these 3 PEP8 violations

example.py:1:1: F401 'math.floor' imported but unused
example.py:3:1: E302 expected 2 blank lines, found 1
example.py:3:80: E501 line too long (149 > 79 characters)

Black formatter

Quotation marks are something that can be written either with single or double-quotes. And PEP8 mentions

In Python, single-quoted strings and double-quoted strings are the same. This PEP does not make a recommendation for this. Pick a rule and stick to it. When a string contains single or double quote characters, however, use the other one to avoid backslashes in the string. It improves readability.

But the only way of enforcing it in your code is to install additional plugin for flake8, which aims to keep quotation marks consistent - flake8-quotes. The problem with this solution is that if flake8 doesn’t allow you to configure something, you need to install plugins for it. You end up having a configuration file and multiple plugins. But then you create another project and moving all these linting rules 1:1 to the next project might not be your highest priority and of course you want to keep things consistent within your organization. Also, what if your new team member will say he doesn’t like double quotation marks? Now it’s your preference against his.

Black formatter helps you with keeping things consistent without having to talk about the convention as it’s opinionated formatter. It, for instance, defaults to using double quotation marks, because it’s just easier if you want to have another string within the initial string. It seems like having black has become a default way of handling all the formatting in the Python ecosystem. It kind of enforces it’s own formatting rules but they are still compliant with PEP8, so it’s not a new standard.

pip install black

Let’s consider the same example as before. Black will automatically

  • break the line that is too long
  • add additional blank line between your import statement and function declaration
from math import floor

def foo(myNotSoShortVariableName: int): return myNotSoShortVariableName / 2 if myNotSoShortVariableName % 2 == 0 else myNotSoShortVariableName // 2


if __name__ == '__main__':
    foo(4.2)

to

from math import floor


def foo(myNotSoShortVariableName: int):
    return (
        myNotSoShortVariableName / 2
        if myNotSoShortVariableName % 2 == 0
        else myNotSoShortVariableName // 2
    )


if __name__ == "__main__":
    foo(4.2)

Much better, isn’t it?

So if black does what flake8 requires, why use both? Well, black rules of formatting are a subset of PEP8 rules. But their goal is different. If I run flake8 against the formatted code, there’s still one rule violation and black won’t add/remove code for you (which is good).

example.py:1:1: F401 'math.floor' imported but unused

MyPy if you want to squeeze the best out of your code base

I am a huge static typing advocate. It really allows you to catch your mistakes faster while it’s also nice that static typing is not required at all times.

MyPy is the most popular optional static type checker for Python. It has achieved relatively a lot as it was developed in pair with the new Python 3.x typing module.

pip install mypy

Lets remove unused import that flake8 was warning us about. Now the code looks like this.

def foo(myNotSoShortVariableName: int):
    return (
        myNotSoShortVariableName / 2
        if myNotSoShortVariableName % 2 == 0
        else myNotSoShortVariableName // 2
    )


if __name__ == "__main__":
    foo(4.2)

flake8 doesn’t complain. black has nothing to format. What else is there that needs to be done?

mypy example.py

yields

example.py:13: error: Argument 1 to "foo" has incompatible type "float"; expected "int"

The variable myNotSoShortVariableName was supposed to be an int but float (4.2) has been passed in the example. It’s obviously a pseudo code so it might not seem like a huge improvement but once you just force yourself to employ mypy in your codebase you will see the gains. It’s not a simple tool though, so you might want to adjust a configuration to your patience.

Warning! Because of how types work in Python, you need to import them first in your code. New web frameworks, like Starlette, are 100% static typed while Django will never be able to achieve it as it was written differently. It’s not something bad per-se. It is what it is and that’s all. Therefore, if you work on a Django project, MyPy might bring you more pain than pleasure. It’s definitely easier to use MyPy in microframeworks.

Using pre-commit

Pre-commit hook is a git functionality that can be easily “plugged into your project” using Python syntax. Pre-commit framework allows you to define all the hooks in one file and it does all the changes automatically, before you commit.

  1. Install it using
pip install pre-commit

or

brew install pre-commit
  1. Add to your project .pre-commit-config.yaml file. Populate it with basic hooks (copy paste)
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
  rev: v3.2.0
  hooks:
    - id: check-merge-conflict
    - id: debug-statements
    - id: trailing-whitespace
    - id: end-of-file-fixer
    - id: check-yaml
    - id: check-toml
    - id: check-json

- repo: https://github.com/psf/black
  rev: 20.8b1
  hooks:
    - id: black

- repo: https://github.com/pycqa/flake8
  rev: 3.8.3
  hooks:
    - id: flake8

- repo: https://github.com/pre-commit/mirrors-mypy
  rev: v0.740
  hooks:
    - id: mypy
  1. Stage your pre-commit configuration
git add .pre-commit-config.yaml
  1. Install the hook itself, defined in our .pre-commit-config.yaml
pre-commit install
  1. Autoupdate hooks if needed with
pre-commit autoupdate
  1. Execute the hook without commiting
pre-commit run --all-files

From now on, you are good to go.