Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a function from a modular template in python 3.6+ with readable and debuggable code

Do you have an idea how to dynamically create a function from a modular template, where the template code is readable, collected in one place and the resulting function code contains only what's needed and shows correctly in the traceback?

Background

In the context of a simulation framework I want to dynamically create a function that is called very often (lets say more than a million times) during runtime. The function implements a mathematical expression that is to be evaluated and multiple instances of the function may exist with variations to the actual mathematics and thus function code. An example would be the expression a + b*c, with variations a and a + b, but possibly also a**c instead. The actual equation is more complex and can have more disruptive variations.

The different function instances are defined at initialisation of the simulation and all of them are called in each time step. I thus whish to to minimize the code executed in each function at runtime and not carry around unneccessary luggage. At the same time, I would like to have all variations in one template instead of duplicating the same code with slight variations multiple times. I do not want to need to check every single duplicate if I am making changes to the code.

Lets assume for simplicity, that a, b, c are defined in some outer scope and don't need to be passed explicitly.

A few example of how I don't want to solve the problem

1: Always evaluate the maximal equation

def full_equation():
    return a + b*c

With this solution I have to look up a, b and c in every time step and calculate the summation and product, even if b and c are not needed at all (i.e. set to 0 and 1, respectively). This is extra computation that I would like to avoid. Also, this way the alternative equation a**c is not covered and needs to be implemented in a different function.


2: Implement every possible variation explicitly

def variant_1():
    return a + b*c

def variant_2():
    return a + b

def variant_3():
    return a + b*c

def variant_4():
    return a**c

Next I would implement a chooser function that checks under which conditions which version of the function needs to be used. This solution minimizes the computational effort at runtime but blows up the codebase considerably for more complex expressions and variations with conditional dependencies between them. If I want to do a minor change to the core expression, I have to track down every single variant and check it individually - which may very easily go wrong. That is why I would like to avoid this solution.


3: Check all conditions at runtime

def function_with_lots_of_ifs(cond_a, cond_b, cond_c):

    if condition_a:
        return a
    else:
        if condition_b:
            return a + b
        elif condition_c:
            return a + b*c
        else: 
            return a**c

This solution is computationally inefficient, since all conditions need to be checked in every time step. I would like to avoid any if's in the simulation runtime outside initialisation.


My current solution that screws up debugging

What I have resorted to for now is string execution:

def template_builder(cond_a, cond_b, cond_c):

    second_part = ""
    sum_snippet = ""
    product_snippet  = ""

    if not cond_a:
        if cond_b:
            sum_snippet = " + b"
            if cond_c:
                product_snippet = "*c"
            second_part = f"{sum_snippet}{product_snippet}"
        else: 
            second_part = "**c"

    template = f"""
def run_func():
    a{second_part}"""
    return template

print(template_builder(False, True, False))

This returns '\ndef run_func():\n a + b', which can be executed using exec to define the function run_func. So far so good, all code is in one place and the resulting function includes only the code necessary. The code may be rearranged a bit to improve readability, but the main problem with this solution is debugging it, e.g.:

a = "s"
b = 2
c = 3
run_func()

returns

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-68-6a3db6ea9fbb> in <module>()
      1 a = "s"
----> 2 run_func()

<string> in run_func()

TypeError: must be str, not int

I can see that some string collided with some int where it shouldn't and that it happened inside my run_func. But I don't know which variant of the function caused the problem and where exactly the error occured (again, imagine the code may be a lot more complex). Does anyone have a suggestion, how to get a proper display of the code in the traceback as you would expect in any of the first three solutions - without their respective issues? Also, I have read in a comment to this answer that

Any time you think "I could use exec…" you're almost certainly doing it wrong.

I am open for suggestions how to do it differently. I have thought about decorators, but couldn't see a way to solve the problem. Also note that nested function calls would be computationally inefficient.

like image 996
dafrose Avatar asked Apr 20 '18 12:04

dafrose


2 Answers

Any time you think "I could use exec…" you're almost certainly doing it wrong.

Almost certainly. String metaprogramming is one of those times that it's appropriate to use eval or exec. Even the standard library does this. (See the namedtuple implementation.)

But there are various other ways to do metaprogramming in Python. Given what your concerns are (performance, debugging), you'll want to use the ast module.

Using ast correctly is more difficult than metaprogramming with strings. There's a lot of incidental complexity you'd have to figure out. So I'd recommend using a library that abstracts this away.

One of the best ast-based Python metaprogramming libraries I know of is Hy. With Hy macros you can build arbitrary functions at compile time using a fairly simple syntax based on s-expressions, a syntax that maps very naturally onto abstract syntax trees.

Here's an example of using Hy macros.

=> (defmacro template-builder [func-name &rest args]
...  `(defn ~func-name[]
...    ~(.format "generated func named {}" func-name)
...    (-> a ~@args)))
from hy import HyExpression, HyList, HySymbol
import hy
hy.macros.macro('template-builder')(lambda hyx_XampersandXname, func_name,
    *args: HyExpression([] + [HySymbol('defn')] + [func_name] + [HyList([])
    ] + ['generated func named {}'.format(func_name)] + [HyExpression([] +
    [HySymbol('->')] + [HySymbol('a')] + list(args or []))]))

<function <lambda> at 0x00000245B90B5400>
=> (template-builder foo)
def foo():
    """generated func named foo"""
    return a


None

=> (template-builder bar (+ b))
def bar():
    """generated func named bar"""
    return a + b


None

=> (template-builder baz (+ b) (* c))
def baz():
    """generated func named baz"""
    return (a + b) * c


None

=> (template-builder quux (+ b) (** c))
def quux():
    """generated func named quux"""
    return (a + b) ** c


None

=>

Thank a lot for your answer! I went a bit through the docs and tried your examples. Seems like a lot of fun. Do I understand correctly, that Hy is meant to be used with it's own lisp-based syntax?

Yes, although this all ultimately translates to Python (ast), it's not always pretty. You could (in principle) use Hy's models from the Python language to implement macros, without writing s-expressions at all, not that I recommend this. $ hy2py foo.hy will show you the Python translation, and $ hy --spy will do it interactively.

Furthermore, only the ast-manipulation part needs to be written in Hy. Hy compiles to Python ast, which CPython then compiles to its own bytecode, so it has transparent interop with Python. You can import and use modules written in Hy from Python code, just like any other Python module. The end user need not even know it is written in Hy.

Either way it seems to me, that generating code with Hy is in itself not very readable (in the Python sense). Is that observation correct?

Lisp is different than Python. Like any language, you have to get used to it, and it's possible to write clear or obfuscated code.

If yes, it would make it more difficult to maintain the resulting library.

Not for the reasons you might think. Python's ast is accessible to the end user via the ast module, but it is considered an implementation detail that is subject to change with every Python version. If you do the ast manipulation yourself, you'll have to keep up with this. This is a major advantage of just using exec with strings.

Hy, on the other hand, guarantees ast compatibility will all of Hy's supported Python versions simultaneously. But Hy itself has been going through changes and does not have a completely stable API yet. If you upgrade your Hy version (which may be necessary to keep up with Python) you may have to adapt your Hy code too. This is probably still easier than writing the ast manipulation yourself.

Since I am working in a scientific context, simplicity/readability of syntax is a rather important feat.

Lisp's syntax is actually much simpler than Python. That's why it's easier to write macros with. There's no statement/expression distinction. No operator precedence. No indentation levels to keep track of. Everything is a generalized function call with just a little syntactic sugar that expands to these (e.g. 'foo expands to (quote foo), and you can use the latter in a macro definition if it's easier. There's also quasiquote `, which allows unquote ~ and splicing unquote ~@ inside, which is the easy way to make macro templates.)

S-expressions absolutely must be indented properly to be human readable--so you know which arguments go with which function even when they're nested deeply. (Play with parinfer until you get it.) The rest is just basic familiarity.

You can already read Python's function call syntax spam(eggs, ham, bacon). In Lisp you drop the commas (the grammar is simple enough that spaces are sufficient) spam(eggs ham bacon) then move the opening parenthesis one step earlier (spam eggs ham bacon). That's basically it. Simpler, isn't it?

Hy adds a little more sugar than most other Lisps, like Clojure does, for other data structure types--[1 2 3] for lists {"a" 1 "b" 2} for dicts--using the same bracket types as Python would. And #{1 2 3} for sets.

There is a semantic distinction between a real function/method which evaluates its arguments first, and a macro/special form which may not. But they're all written the same way, like a function call. The rest is just vocabulary, same as importing any other library.


All that said, Hy is not the only ast manipulation library. Try searching PyPI for "ast" to find others.

like image 93
gilch Avatar answered Oct 13 '22 10:10

gilch


Short answer, I plan to expand when I have more time:

Regarding the original question, I ended up not creating the functions, but rather define a domain specific language (DSL, as suggested by @Gabriel) and parse it into a simple version of an abstract syntax tree (AST, similar to answer by @gilch). The equations are in the end parsed into a tensorflow dataflow graph for computational efficiency. All additional info in the DSL is used to configure the dataflow graph. Dataflow graphs can be visualised for debugging using tensorboard. It takes a bit of time to wrap your head around, but as an interactive graphical representation of all your operations, it already helps a lot.

For now tensorflow is the only implemented backend, but we might also implement a backend based on numpy - which might mean, that we need to create the functions as originally planned. Gabriels suggestion for that was to write the functions to file and then load them, so they can easily be inspected and debugged. Gilch's answer using Hy might also work for that.

like image 42
dafrose Avatar answered Oct 13 '22 09:10

dafrose