Typechecking Python for fun (and profit?)

May 30, 2020 - python dev

I'm assuming you agree (or will consider) that adding some type-checking to your Python code can help you find bugs or otherwise improve your software. You've definitely heard of mypy, and possibly one or more of pytype, pyre, and pyright.

That's a lot of options! What should you use?

tl;dr conclusions

Use pytype, in your testing / continuous integration step (you do have one, I hope?)
- Unless you're using Python 3.8, in which case you can't yet, and you should use pyright instead
- Update, from the day after I wrote this: partial support is now here for Python 3.8 in pytype
If you want more constant assurance, use pyright in a commit hook.
If your editor has good support (e.g. PyCharm) that might suffice for you, but it's nice to have tools that work for all your collaborators who might not use the same editor

preamble & motivation

This is not a discussion of the whys and wherefores of type-checking Python code. Nor am I going to write a basic tutorial for getting started with it, or a detailed guide to its ins and outs (there's more than one). I'm also not going to be telling the world about how my large team of engineers implemented it in our million-line codebase for fun and profit.

I'm going to compare these four tools, for my use case, as of May 2020, in hopes that this will help someone else dip their toes in the water for their project. In that sense, this is more along the lines of a follow up (or perhaps even a response) to this "field test" post from a few months ago) -- my opinions have been shaped by using and re-evaluating these tools since early 2019 for production Python web and data science applications.

With that out of the way... here we go!

setup

So that you have a sense of what I'm working with (and how that may compare to what you're using):

A fresh Python 3.7.7 Conda environment on x86_64 GNU/Linux

A fresh checkout of dask/zict

it's a small but not trivial project, with test subdirectories that need to be excluded, optional dependencies, a little bit of clever code, and other real-world things
about 650 lines of code according to tokei
I'd hoped to be able to find a nice example of a type bug that was lurking undiscovered here, but I havent found one, so this will have to serve as a sample "clean" codebase
(I've run pip install -e . in the checkout, to make sure my Python environment is set up)

A sample script with a number of errors I'd expect a typechecker to find:

In obvious_annotated_error, we expect an int argument that's then concatenated to a string
In unannotated_error, we call _innocuous_helper with an int when it's expecting a string
in less_obvious_error, we assume that the result from _ambiguous_returning_helper will be a string, but if the words we've passed in don't contain "hi", it'll be None
Conversely, there's also the AllKindsOfDynamic class which I hope the typechecker will leave undisturbed

so, what are my options?

mypy

mypy is probably the first project that comes to mind when you think of typing and Python -- with good reason, since the development of mypy helped to drive a lot of the discussions and PEPs around typing in Python. It's actively developed, and there are lots of conference talks about it. It also benefits from the halo effect of being associated with Python's creator.

Mypy is sponsored by Dropbox.

configuration

(I'm using mypy==0.770)

No configuration seems to be necessary -- running mypy is as simple as pip install mypy followed by mypy zict/ or mypy sample.py. A setup.cfg or mypy.ini can be added, but I couldn't find a way to ignore the tests/ subdirectory within zict/. So you might end up running it over your tests, too. You'll probably also want --ignore-missing-imports to get started with, since otherwise mypy will complain about not having type information for all the libraries you use.

speed

At well under a second this is fast enough for me on a small project, and there is a long-running daemon available available for larger projects.

$ time mypy -p zict --ignore-missing-imports
Success: no issues found in 18 source files

real    0m0.818s
user    0m0.754s
sys     0m0.063s

accuracy

For me, mypy falls down on accuracy and useful errors -- there's no happy medium. Without any options, mypy only finds 1 out of 3 expected errors in sample.py:

$ mypy -m sample
sample.py:6: error: Unsupported operand types for + ("str" and "int")
Found 1 error in 1 file (checked 1 source file)

With the --strict option, it demands that type annotations be added, but doesn't catch any more errors:

$ mypy -m sample --strict
sample.py:4: error: Function is missing a return type annotation
sample.py:6: error: Unsupported operand types for + ("str" and "int")
sample.py:9: error: Function is missing a type annotation
sample.py:14: error: Function is missing a type annotation
sample.py:18: error: Call to untyped function "_innocuous_helper" in typed context
sample.py:21: error: Function is missing a return type annotation
sample.py:33: error: Function is missing a type annotation
sample.py:49: error: Function is missing a type annotation
sample.py:54: error: Function is missing a type annotation
sample.py:61: error: Call to untyped function "AllKindsOfDynamic" in typed context
Found 10 errors in 1 file (checked 1 source file)

pytype

There's been less noise made about pytype, but I've seen some mentions at conferences and the occasional tutorial.

Pytype is sponsored by Google.

configuration

(I'm using pytype==2020.5.13)

pytype will work if you just point it at your source directory, but in order to get it to ignore your tests files you need a configuration file -- this is pytype.cfg by default (and there's a handy --generate-config option to create one) but it'll read setup.cfg too. If you use a configuration file, though, you have to configure everything within it -- including where to look for code to typecheck.

If you don't mind running it over your test files, I recommend the --keep-going command-line option so it reports all errors rather than stopping at the first one.

My trimmed-down pytype.cfg:

[pytype]
# Space-separated files / directories to exclude.
exclude =
    **/versioneer.py
    **/tests/**
    **/test_*.py
# Space-separated files / directories to process.
inputs =
    zict/
# Keep going past errors, analyze as many files as possible.
keep_going = True

speed

The biggest issue I have with pytype is that it's slow. Even on this small project it's slow enough that I would be irked by running it afresh every commit:

$ time pytype --config pytype.cfg
Computing dependencies
Analyzing 11 sources with 0 local dependencies
ninja: Entering directory `/home/<...>/.pytype'
[11/11] check conf
Success: no errors found

real    0m5.651s
user    0m13.495s
sys     0m0.339s

It does have nice incremental checks based on ninja, so subsequent runs are certainly fast enough:

$ time pytype --config pytype.cfg
Computing dependencies
Analyzing 11 sources with 0 local dependencies
ninja: Entering directory `/home/<...>/.pytype'
ninja: no work to do.
Success: no errors found

real    0m0.556s
user    0m0.480s
sys     0m0.076s

However, I've found that occasionally pytype generates flaky results and clearing out the .pytype/ cache directory and re-running it fixes things. So I can't entirely shake a mistrust of the incremental builds, and the slowness irks me all the more.

accuracy

This is where pytype shines for me -- it caught all three of the real errors in sample.py, with partial tracebacks pointing out the error, and didn't touch the perfectly sound AllKindsOfDynamic class:

$ pytype sample.py
Computing dependencies
Analyzing 1 sources with 0 local dependencies
ninja: Entering directory `/home/<...>/.pytype'
[1/1] check sample
FAILED: /home/<...>/.pytype/pyi/sample.pyi
/home/<...>/bin/python -m pytype.single --imports_info \
  /home/<...>/.pytype/imports/sample.imports --module-name sample -V 3.7 -o /home/<...>/.pytype/pyi/sample.pyi --analyze-annotated --nofail --quick /home/<...>/sample.py
File "/home/<...>/sample.py", line 6, in obvious_annotated_error: unsupported operand type(s) for +: 'str' and 'int' [unsupported-operands]
  Function __add__ on str expects str
File "/home/<...>/sample.py", line 11, in _innocuous_helper: No attribute 'split' on int [attribute-error]
Called from (traceback):
  line 18, in unannotated_error
File "/home/<...>/sample.py", line 39, in less_obvious_error: No attribute 'upper' on None [attribute-error]
  In Optional[str]

For more details, see https://google.github.io/pytype/errors.html.
ninja: build stopped: subcommand failed.

extras

The merge-pyi script that comes with pytype is interesting -- it can take the inferring .pyi type stub file generated by pytype and merge the annotations back into your code! In my experience I've found that pyre's infer subcommand (see below) does this just as well, with one fewer step in hunting down the generated .pyi file, but it's very impressive nonetheless.

pyre

Not to be outdone by other companies, Facebook sponsors pyre, which has even fewer conference talks about it (just the one hit that I could find) and has the temerity not even to be written in Python! Performance is the claim all over its website.

configuration

(I'm using pyre-check==0.0.46)

Unlike mypy or pytype, pyre won't run without any kind of configuration -- it complains until pyre init is run to generate a JSON-formatted .pyre_configuration file. It took me a little fiddling with the "source_directories" setting to get pyre to run without throwing up a lot of spurious "Undefined import" errors -- and it looks like the only way to silence them is to add#pyre-ignore-all-errors[21] to all the files affected.

speed

For an initial run, this is barely fast enough for me to want to run for each commit -- it's more than twice as fast as pytype, but much slower than mypy.

$ time pyre check
 ƛ No type errors found

real    0m1.720s
user    0m0.480s
sys     0m0.099s

It's worth noting there's anincremental command that spins up a server in the background -- this will work with the LSP protocol for VS Code & Nuclide to incrementally run additional checks as code changes. I don't use either VS Code or Nuclide and wasn't able to get it to work, unfortunately.

accuracy

Unfortunately, pyre only found 1 of 3 expected errors in sample.py:

$ pyre check
 ƛ Found 1 type error!
foo/sample.py:6:11 Incompatible parameter type [6]: Expected `int` for 1st positional only parameter to call `int.__radd__` but got `str`.

extras

I like the variety and personality of subcommands available -- they had me at rage for verbose debugging. But I'm also very impressed by the infer subcommand that can, with the right flags, add annotations to your code in-place! Unlike the merge-pyi helper supplied by pytype, this doesn't require you to generate and specify a separate .pyi file -- which I for one think is very handy.

pyright

Another entry that isn't even written in Python is pyright, sponsored by Microsoft, and probably your best option if you're a VS Code user because it's readily available as a VS Code extension.

Since it's written in Typescript, you'll need to install it through npm: npm install pyright@1.1.38

configuration

pyright is similar to pytype in that it works fairly seamlessly without a configuration file, but if you want more tuning (e.g. excluding test files) you need to add a pyrightconfig.json file. I used this configuration:

{
  "include": [
    "zict"
  ],

  "exclude": [
    "**/node_modules",
    "**/__pycache__",
    "**/tests/**"
  ],
  "reportMissingImports": false
}

I should add that pyrightconfig.json exposes a lot of options for warnings to tune, and I found the documentation quite comprehensive and helpful.

A warning! Folks at a previous company found that by default pyright also wanted to check their entire virtual environment, if it happened to be next to their code. I suggest you use a pyrightconfig.json and be sure to explicitly exclude your virtual environment / pyenv directories.

speed

No complaints as far as speed goes -- it's barely over a second for a fresh run, and there's a --watch option for a long-running process:

$ pyright
Loading configuration file at /home/<...>/pyrightconfig.json
stubPath /home/<...>/typings is not a valid directory.
Searching for source files
Found 11 source files
/home/<...>/zict/doc/source/conf.py
  69:16 - error: "__version__" is not a known member of module (reportGeneralTypeIssues)
/home/<...>/zict/zict/buffer.py
  61:19 - error: "Unknown" is not iterable
  70:19 - error: "Unknown" is not iterable
/home/<...>/zict/zict/lru.py
  66:23 - error: "Unknown" is not iterable
  88:19 - error: "Unknown" is not iterable
5 errors, 0 warnings
Completed in 1.027sec

accuracy

Here's where we find my biggest gripe with pyright -- it's limited in working with more dynamic Python. As with the spurious errors when checking zict, it's also complaining about self.callbacks in our sample script -- even though careful examination of the code will show self.callbacks should always be a list.

It also only found one of the three expected errors in our file:

$ pyright sample.py
stubPath /home/<...>/typings is not a valid directory.
Searching for source files
Found 1 source file
/home/<...>/sample.py
  6:12 - error: Operator "+" not supported for types "Literal['Hello, ']" and "int" (reportGeneralTypeIssues)
  56:19 - error: "Unknown" is not iterable
2 errors, 0 warnings
Completed in 0.619sec

conclusions

So, to sum things up, and expand on the reasoning for my tl;dr section above:

mypy is fast and the reference implementation, but doesn't catch all the errors we'd want
pytype is the most accurate, but slowest, and doesn't support Python 3.8
pyre is slower and no more accurate than mypy, but does have helpful tools like the infer subcommand
pyright is almost as fast as mypy, and the most configurable, but reports unnecessary errors

To echo what I said above, then: I've found pytype really helpful in a continuous integration step, where I'm expecting my tests to take a few seconds to run anyway, so another 10-15 seconds are much less painful. (One work project regularly took over a minute). For constant checking, because of its configurability, I like pyright.

I should add that mypy continues to improve rapidly. And while pyre doesn't serve my day-to-day needs as well, I've used the infer subcommand effectively before and liked it.

Any additional typechecking you add to your project will, in my opinion, probably help -- but these are the tools I recommend.