Jun 15, 2020

Consistent Python environments with Poetry and pre-commit hooks

Clean and Consistent Environments

Regardless of the programming language you are working in, it can sometimes be a struggle to maintain a clean codebase and a consistent development environment for all members of your team, especially if your teammates are split between platforms or use different editors.  However, you can simplify this process with three straightforward strategies: 1) set up your Git attributes appropriately, 2) use Poetry to manage your development environment, and 3) enforce a coding standard through pre-commit hooks.   Below, I’ll dive into each strategy in more detail. Check out my sample apologies repo to see how this works in practice.

Git Attributes

If your teammates work in the same codebase from both UNIX-like platforms (macOS or Linux) and Windows, then it’s especially important to set up Git attributes to manage line endings.  However, even if you’re only working on a single platform, it’s still a good idea.  Git makes it fairly easy to shoot yourself in the foot, and diagnosing a problem related to line endings can be confusing.   To manage line endings, set up your .gitattributes file appropriately.   This StackOverflow answer has a good discussion of the options, but for new projects it’s really as simple as grabbing the appropriate content from this collection of .gitattributes templates.    For an existing project, you may also need to do a one-time normalization step as described in this GitHub documentation.

Poetry in Motion

One of the more difficult things to manage for any Python project is the dependencies and the resulting development environment.  Most people rely on Python virtual environments, but then you still need to make sure that everyone on your team is using the same setup.  There are a variety of different mechanisms available (for instance, our own Mike Hostetler blogged about direnv back in June of 2019).  My favorite solution is Poetry.  Poetry is a single tool that is used both to manage project dependencies and to construct and utilize a virtual environment based on those dependencies.  It also manages the process of publishing code to a repository such a PyPI.  Your teammates simply need to install a Python interpreter and the Poetry tool itself (which is as simple as brew install poetry on macOS), and then everything else happens auto-magically from there.

For instance, if you clone my apologies repo, you can do this:

localhost:~/projects/repos/apologies> poetry install
Creating virtualenv apologies-pSRSS4B3-py3.7 in /Users/kpronovici/Library/Caches/pypoetry/virtualenvs
Installing dependencies from lock file
- Installing chardet (3.0.4)
- Installing idna (2.9)
- Installing markupsafe (1.1.1)
- Installing sphinx-autoapi (1.2.1)
- Installing tox (3.14.5)
- Installing apologies (0.1.6)

Once that’s done, Poetry manages your virtual environment for you.  You can use poetry add or poetry remove to add and remove dependencies, which are tracked in pyproject.toml.  The virtual environment is automatically updated to include those dependencies.  If new dependencies are added, developers can refresh their environment using poetry install.

Even better, developer-only dependencies can be added with the –dev switch.  This means that any tool you want your developers to have access to can be managed by Poetry.   For instance, in my project, the developer dependencies include Pylint.  I can run Pylint out of the Poetry virtualenv using poetry run pylint.

Poetry goes a long way toward making dependencies and virtual environments simple to use, and it’s worth your time to look into it.

Pre-Commit Hooks for Standards

Once you’ve made it easy for all of your teammates to be working with a consistent development environment, turn your attention to code consistency.   I take a two-pronged approach to coding standards, with some tools focused on code formatting and other tools focused on code quality.

For code formatting, I rely on Black and isort.  Black ensures that everyone’s code looks the same, while isort makes sure that imports are referenced in a consistent manner.  With some care, you can configure isort so that it formats import statements in exactly the same way as Black would, avoiding conflicts.

For code quality, I rely on Pylint and MyPy.   Pylint enforces a coding standard and is also a general linter.  MyPy is a static type checker for Python.  Between these tools, I can catch most problems.  Some people find Pylint to be more trouble than it’s worth, and use the Flake8 linter instead.

Once you configure your code formatting and code quality tools, the next thing you need to do is make sure that everyone uses those tools consistently.  One approach is to apply the checks during your continuous integration (CI) process, failing the build if standards are not met.  This is important, but I consider it a fallback.  Instead, I recommend that you apply these checks as pre-commit hooks.   This way, non-standard code never has a chance to enter the repository.

To manage pre-commit hooks, I use the pre-commit package.   This package relies on a file called .pre-commit-config.yaml in the root of your repository.  When a new developer joins the team and clones the repository, they can enable all of the pre-commit hooks using poetry run pre-commit install.  For my sample apologies repo, a commit now looks like this:

localhost:~/projects/repos/apologies> git commit -m "Release v0.1.6" pyproject.toml
[master 8e9e6e6] Release v0.1.6
1 file changed, 1 insertion(+), 1 deletion(-)

If any of the hooks fails, then the commit won’t complete.  For instance, if Black updates formatting, then the newly-updated files will be left in the repo and will need to be added to the commit.  Or, if Pylint finds errors, the developer will need to fix those errors before committing.

The pre-commit package has a list of supported tools and knows how to create a Python virtualenv to install and run those tools.  Since I already have Poetry to manage my virtualenv, that’s overkill for me.  Instead, I configure everything as a “local” hook, and execute the tools via poetry run.  That way, developers have an easy way to run the exact same checks outside of the pre-commit hook, both from the command-line and within an IDE like IntelliJ.

MyPy and Pylint can sometimes be fairly slow, especially for large projects.  If this is the case, you might need to rely on the CI system to enforce these standards instead.  Everything is a compromise, so choose the tools and process that add the most value for your team.

About the Author

Ken Pronovici profile.

Ken Pronovici

Sr. Consultant

Ken has extensive experience working as a software developer, technical lead, architect, and product owner in a variety of industries.  He has been a Debian Developer since 2003, and is a contributor to other open source projects.   He enjoys creating systems – not just software – and is comfortable working as an individual contributor, managing a software development team, or consulting as a specialist across a variety of languages, platforms, and development environments.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Seamlessly Integrating Micro Apps with iFrame
A recent client wanted to upgrade a small portion of their legacy application with a more modern UI and extra functionality, like fuzzy text search. There are a few approaches to incremental upgrades of legacy […]
Consul on Docker EE
[markdown]”Consul is a service networking solution to connect and secure services across any runtime platform and public or private cloud” – [https://consul.io](https://consul.io) This post details a way to run a Consul cluster on Docker EE […]
Passing the AWS Machine Learning Speciality Exam
The Amazon Machine Learning Specialty Exam is a 3-hour, 65 question test. It is designed to test your skills in AWS specific Data Engineering and Machine Learning Practices along with Machine Learning in general. I […]
Concurrently Process a Single Kafka Partition
Concurrency in Kafka is defined by how many partitions make up a topic. For a consumer group, there can be as many consumers as there are partitions, with each consumer being assigned one or more […]