I want to be able to perform code generation of python given an AST description.
I've done static analysis of C and built AST visitors in python, so I feel relatively comfortable manipulating a syntax tree, but I've never attempted code generation before and am trying to determine the best practice for generating python code.
Specifically, I'd love pointers on how automatic code generation is typically done, or any pointers to libraries targetting python which could make this task simpler.
My end goal is to attempt something similar to csmith or a tool to make python code compliant with PEP8.
You can create a generator model with the YAKINDU Statechart generator model wizard by selecting File → New → Code generator model. The code generation is performed automatically whenever the statechart or the generator file is modified. See also chapter Running a generator for more information.
How do you write hello world python? The easiest way to display anything on the output screen in the python programming screen is by using the print() function. To print hello world, we can design a python hello world program that will print “Hello world” to the output screen using the print() function.
Python provides a generator to create your own iterator function. A generator is a special type of function which does not return a single value, instead, it returns an iterator object with a sequence of values. In a generator function, a yield statement is used rather than a return statement.
You may want to take a look at the 2to3
tool, developed by the Python code devs to automatically convert Python 2 code to Python 3 code. The tool first parses the code to a tree, and then spits out "fixed" Python 3 code from that tree.
This may be a good place to start because this is an "official" Python tool endorsed by the core developers, and part of the recommended Python 2 to 3 migration path.
Alternatively, check out the codegen.py module, which generates Python code back from Python's ast
.
See also this SO question, which may be relevant to yours (I'm not marking it a duplicate because I'm not sure the scopes of the questions overlap 100%)
Automatic code generation is commonly done in the following ways:
IMHO, better practice is:
Hardly anybody does the latter, because the tools are mostly not there.
Python's 2to3 tool provides (I think) the target AST and prettyprinting.
But a question you didn't ask, is "generate from what?" Somehow you have to specify abstractly what you want generated (or it isn't a win). And your tool has to be able to read that specification somehow.
Many code generation schemes consist of writing procedural code that calls the above generation mechanisms; the procedural code acts as an implicit specification. It is "easy" to read the specification; it is just code in the language used by the code generator.
Some code generation schemes use some kind of graph structure to provide a frame on which fragments of specification are hung, that drive the code generation. UML class diagrams are a classic example. These schemes aren't so easy; you need a "specification reader" (e.g., UML diagram reader aka XMI or some such, or if you aren't using UML, some kind of specification parser), and then you need something to climb over the just-read specification in some useful order (UML is graph, there are many different ways it can be visited), that makes calls on code generation steps.
The Python 2to3 tool uses a Python2 parser to read the "spec". If you want to generate code from Python2, that will be fine. I suspect you don't want to do that.
A best practice approach is one that unifies the ability to read/analyze/traverse specifications, with the ability to produce ASTs for the target language.
Our DMS Software Reengineering Toolkit is a general purpose program analysis and transformation system. It parses "specifications" (instances of grammars to you can define to it) into ASTs; it will also let you build arbitrary ASTs for any of those grammars, using either procedural code [as sketched above] or using pattern-match/replacement (pretty much unique to DMS). Part of a DMS langauge front end is a prettyprinter, that can regenerate text from ASTs (these are tested by roundtripping code: parse to AST, prettyprint AST, better be the same text).
In case your grammar isn't known to DMS, it has extremely good parser and prettyprinter generators, as well as other support mechanisms for analyzing programs. All that additional machinery is usually not available with classic parser generators, or with just a plain "AST" package. (I don't know what is in 2to3).
The relevance of this to Python is that DMS has a Python front end as well as grammars for many other languages.
So, you can use DMS to parse your specification, and to generate Python code using ASTs, finally followed by prettyprinting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With