pyxl or interpy are using a very interesting trick to enhance the python syntax in a way: coding:
from PEP-263
# coding: pyxl
print <html><body>Hello World!</body></html>
or
# coding: interpy
package = "Interpy"
print "Enjoy #{package}!"
How could I write my own coding:
if I wanted to? And could I use more than one?
I'm Syrus, the creator of interpy
.
Thanks to codecs # coding: your_codec_name
in Python we have a chance to preprocess the file before it is converted to bytecode.
This is how it works:
At first, Python reads the file and stores its content. As the content could be encoded in a strange format, Python tries to decode it. Here is where the magic happens.
If the coding is not found, Python will try to decode the content with the default string coding: Ascii or UTF-8 codecs depending on the Python version. This is why you have to write # coding: utf-8
when using unusual chars (á, ñ, Ð, ...) in Python 2, because Ascii is the default.
If we register a custom codec (both encoder and decoder), and a file tells Python it is using our codec (via # coding: codec_name
), then Python will decode the file with our codec.
To register the codec without needing an import, we create a path configuration file (.pth) which registers the codec before any non-main-module is executed.
Once the decoder of our codec is called, we can modify the output we want, but... how do we know Python syntax (tokens) inside this content?
Simply call the Python tokenizer with the file contents and modify the desired tokens.
In the case of interpy
, it changes the behavior only when Python strings are found in the file content.
Once we transform the content, we send it back to the Python compiler to be compiled to bytecode.
Hope you find this useful!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With