-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Introduction to Mypyc for Contributors
This is a short introduction aimed at anybody who is interested in contributing to mypyc, or anybody who is curious to understand how mypyc works internally.
Code compiled using mypyc is often much faster than CPython since it does these things differently:
-
Mypyc generates C that is compiled to native code, instead of compiling to interpreted byte code, which CPython uses. Interpreted byte code always has some interpreter overhead, which slows things down.
-
Mypyc doesn't let you arbitrarily monkey patch classes and functions in compiled modules. This allows early binding -- mypyc statically binds calls to compiled functions, instead of going through a namespace dictionary. Mypyc can also call methods of compiled classes using vtables, which are more efficient than dictionary lookups used by CPython.
-
Mypyc compiles classes to C extension classes, which are generally more efficient than normal Python classes. They use an efficient, fixed memory representation (essentially a C struct). This lets us use direct memory access instead of (typically) two hash table lookups to access an attribute.
-
As a result of early binding, compiled code can use C calls to call compiled functions. Keyword arguments can be translated to positional arguments during compilation. Thus most calls to native functions and methods directly map to simple C calls. CPython calls are quite expensive, since mapping of keyword arguments,
*args
, and so on has to mostly happen at runtime. -
Compiled code has runtime type checks to ensure that runtimes types match the declared static types. Compiled code can thus make assumptions about the types of expressions, resulting in both faster and smaller code, since many runtime type checks performed by the CPython interpreter can be omitted.
-
Compiled code can often use unboxed (not heap allocated) representations for integers, booleans and tuples.
Mypyc supports a large subset of Python. Note that if you try to compile something that is not supported, you may not always get a very good error message.
Here are some major things that aren't yet supported in compiled code:
- Some dunder methods don't work, though most of them are supported
- Monkey patching compiled functions or classes
- General multiple inheritance (a limited form is supported)
- Async generators
- The match statement
We are generally happy to accept contributions that implement new Python features.
First you should set up the mypy development environment as described in the mypy docs. macOS, Linux and Windows are supported.
When working on a mypyc feature or a fix, you'll often need to run compiled code. For example, you may want to do interactive testing or to run benchmarks. This is also handy if you want to inspect the generated C code (see Inspecting Generated C).
Run mypyc
to compile a module to a C extension using your
development version of mypyc:
$ mypyc program.py
This will generate a C extension for program
in the current working
directory. For example, on a Linux system the generated file may be
called program.cpython-37m-x86_64-linux-gnu.so
.
Since C extensions can't be run as programs, use python3 -c
to run
the compiled module as a program:
$ python3 -c "import program"
Note that __name__
in program.py
will now be program
, not
__main__
!
You can manually delete the C extension to get back to an interpreted version (this example works on Linux):
$ rm program.*.so
Another option is to invoke mypyc through tests (see Testing below).
Beyond the mypy documentation, here are some things that are helpful to know for mypyc contributors:
- Experience with C. The C Programming Language is a classic book about C. Note that C is a fairly small language and not too difficult to learn.
- Basic familiarity with the Python C API (see Python C API documentation). Extending and Embedding the Python Interpreter is a good tutorial for beginners.
- Basics of compilers (see the mypy wiki for some ideas)
All of these limitations will likely be fixed in the future:
-
We don't detect stack overflows in compiled code.
-
We don't handle Ctrl-C in compiled code.