Quality Assurance for Python project

Make a high-quality code

When buying apples on the market it’s fairly simple to pick the best ones. You cherry-pick by touching them, choosing the best color, ripeness, and absence of visible bruising. This process is called quality control — you choose only good-quality products that meet your requirements (unless you buy apples for apple pie). Things get complicated when there are tons of apples in sorting stations. Ensuring good product quality becomes highly difficult. Only automation of this process can come to a rescue.

The same problem applies to software. When you work solo on a small project is pretty simple to keep an eye on code formatting, good coding practices, package versioning, and testing. However, once the project is growing, and new scripts and dependencies are added, it’s getting hard to check the quality manually. This is the moment when quality assurance comes into play.

Quality assurance (QA) is a common practice to ensure that the end product is compliant with a predetermined set of company standards. It’s a proactive approach that focuses on preventing defects at the process level. It aims at setting up adequate processes and introducing a standard of quality.

QA, QC, and testing in the software development
process (based on the article)

Quality control (QC) is a reactive approach which means that we are making sure that a product corresponds to the requirements and specifications before it’s released. The aim of this process is the verification of product quality.

Testing is detecting errors and undesirable functionality in the software. It focuses on source code and design.

Why QA is so important? To put it briefly, it affects software reliability, maintenance cost, product improvement, and many more.

QA steps for Python project

Once we know the purpose of applying QA in our project, let’s focus on the practical aspect.

If you still have problems declaring, installing, and managing dependencies of Python projects — try out Poetry package. Setting up a Python environment has never been so easy.

Below you’ll find the proposed steps that might be included in QA script. You may include all of them in your project or just pick only those relevant to you. However, some of them are highly recommended — especially testing.

Let’s go briefly through all of them. Each step of QA has different functionality and purposes.

pylint – errors checker
Pylint analyses your code without actually running it (static code analyzer). It follows the style recommended by PEP 8.
black – code formatter
Black ensures that all code in your codebase follows the same styling rules.
isort – imports sorter
Sort imports alphabetically and automatically separate them into sections and by type. Useful when dealing with a great number of imports.
mypy – static type checker
It ensures that you’re using variables and functions in your code correctly. Just add type hints (PEP 484) to your Python programs, and mypy will warn you when you use those types incorrectly.
pydocstyle –docstring convention compliance checker
Use --convention option to specify existing convention and choose the basic list of checked errors. Possible conventions: pep257, numpy, google.
bandit — security issues finder
It reports potential security issues, for instance, vulnerabilities, insecure cryptographic practices, hardcoded secrets, and many more.
pytest — testing framework
The framework allows you to write various types of software tests, including unit tests, integration tests, end-to-end tests, and functional tests. Some of the features are parametrized testing, fixtures, and assert re-writing.

Although libraries mentioned above are well-known old-timers, don’t hesitate to replace them with alternatives that suit you better (yapf code formatter from Google, Ruff linter, etc.).

Template of QA script for Python module

Let’s consider the following project structure:

.
├── my_module
│ └── ...
├── tests
│ ├── conftest.py
│ ...
├── README.md
└── qa.sh

Ready-to-use QA script might look as follows (developed by Piscada):

#!/bin/bash
echo "======== pylint ========"
python -m pylint my_module
exit_pylint=$?
echo ""
echo "======== black ========"
python -m black --check --diff my_module
exit_black=$?
echo ""
echo "======== isort ========"
python -m isort my_module --check-only --diff
exit_isort=$?
echo ""
echo "======== mypy ========"
python -m mypy my_module
exit_mypy=$?
echo ""
echo "======== pydocstyle ========"
python -m pydocstyle --convention=numpy my_module
exit_pydocstyle=$?
echo ""
echo "======== bandit ========"
python -m bandit -r my_module
exit_bandit=$?
echo ""
echo "======== pytest ========"
python -m pytest tests/
exit_pytest=$?
echo ""
echo "======== exit status ========"
echo "pylint: $exit_pylint, black: $exit_black, mypy: $exit_mypy, pydocstyle: $exit_pydocstyle, bandit: $exit_bandit, pytest: $exit_pytest"
! (( exit_pylint || exit_black || exit_mypy || exit_pydocstyle || exit_bandit || exit_pytest))

Each step can be personalized by adding or changing flags to commands. Otherwise, if you use Poetry, package configuration can be defined in pyproject.toml file as well.

# [...] poetry configuration sections 

[tool.black]
line-length = 88 # default value but can be change to [79, 80 or any other] 

[tool.pylint.messages_control]
good-names = ["df", "np", "pd"]

[tool.pylint.format]
max-line-length = 88

[tool.isort]
profile = "black"
line_length = 88

[tool.mypy]
no_implicit_optional = "False"

[tool.pydocstyle]
convention = "numpy"

# ...

For instance, change convention option for pydocstyle command if you use a docstring convention other than numpy . Refer to package documentation when changing the configuration.

Keep in mind that some packages without appropriate configuration might be incompatible. For instance, black and isort will correct each other. To prevent this, define profile = “black” for isort package.

Semantic versioning

Additionally, you can add QA step to check if the project version defined in pyproject.tomlis valid and not already used. For semantic versioning validation, semver library can be used. Moreover, the following code checks whether a tag exists or not in the repository.

import git

repo = git.Repo(".git/")
new_tag = "0.1.1"

if new_tag in repo.tags:
    print("Tag exists.")
else:
    print("Tag doesn't exist")

Running QA

Firstly, give the file qa.sh execute permission:

chmod +x qa.sh

QA script can be run in many ways, e.g.:

./qa.sh
# or
bash qa.sh
# or 
sh qa.sh

If you use poetry package manager, remember to add poetry run, e.g.: poetry run ./qa.sh .

At the end of the script output, a summary of errors will be displayed. In the ideal scenario it’ll look like this:

======== exit status ========
pylint: 0, black: 0, mypy: 0, pydocstyle: 0, bandit: 0, pytest: 0

When to run QA script?

QA just before merging your branch with master may result in many errors, warnings and … frustration.

A good practice is to run QA script every time you push changes to your branch. Thanks to this you will be able to control the quality of your code on an ongoing basis. Any errors or non-compliance with the standards will be caught early.

For instance, QA script can be run automatically by using pre-commit hook.
Git Hooks are scripts that Git can execute automatically when certain events occur. Pre-commit hook means that the script will be executed before you commit changes. For this purpose, create a symbolic link as follows:

ln -s ./qa.sh .git/hooks/pre-commit

Another solution may be to run QA script in your CI/CD pipeline. You don’t need to run it manually, everything is automated. Consider using e.g. Github Actions or Bitbucket Pipelines for this purpose.

Conclusion

The later a bug is caught in the development life cycle, the worse it is for the business. Detecting errors is just one thing. The second is to find the cause and fix the bug. Without good-quality code, debugging gets complex, time-consuming, and costly for business. Not to mention that team members lose their morale and no worker is rushing to take over fixing task.

Therefore, it’s so important to implement quality assurance in your project. During this process not only the functionality of your code is checked but also the compliance with the accepted standard like code formatting conventions. As a result, the code is more readable and easy to maintain.

Aleksandra

Full-stack data scientist. Aleksandra writes about topics that she finds interesting for software developers and scientists who are keen on technology. Fan of Polish dumplings with blueberries.