Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Call Python or Lua from C++ to evaluate an expression, calculating unknown variables only if needed

I have an expression/formula like this

 std::string expr="((A>0) && (B>5 || C > 10))";

I have done some research and it seems that if A,B,C values are known, by embedding Lua or Python in a C++ program, there are eval functions that can substitute A,B and C and return true or false.

But what happens when I don't know all the values? Let's say that A is known and it is -1. If A is -1 then the formula will evaluate to "false" irrespective of the values of B or C.

Can I evaluate a formula without knowing all the variables in advance? For example if A is 10, it makes sense to lookup for the value of B and re-evaluate again. How can we solve these problems? Ideas?

like image 425
cateof Avatar asked Apr 20 '17 10:04

cateof


11 Answers

I don't know of any existing available library for handling this.

The usual approach would be to build an expression-tree and evaluate what is possible - similarly to constant folding in compilers: https://en.wikipedia.org/wiki/Constant_folding

One of the important aspects for that is to know the allowed values for variables, and thus the allowed partial evaluations, e.g. x*0 (and 0*x) is 0 if x is an integer, or finite floating point number, but cannot be evaluated if x is an IEEE floating number (since it could be Nan or infinity), or if x could be matrix since [1,1]*0 is [0,0] not the scalar 0.

like image 69
Hans Olsson Avatar answered Oct 23 '22 16:10

Hans Olsson


One way is to parse the expression into a tree and evaluate the tree. Subexpressions for which all variables are known will be fully evaluated. The effect will be to simplify the tree.

In your example, the tree has && at the top with two subtrees, the left one being the tree for A>0. To evaluate the tree, we evaluate the left subtree, which returns -1, and so we don't need to evaluate the right subtree, because the operator is &&. The whole tree evaluates to false.

like image 32
lhf Avatar answered Oct 23 '22 14:10

lhf


I don't understand exactly what you want to do or understand but I am OK with ivan_pozdeev about short-circuit evaluation and lazy evaluation.

A boolean expression is evaluate from the left to the right and when the result is known the evaluation stop and ignore what is on the right.

With Python:

E = "(A > 0) and (B > 5 or C > 10)"
A = -1
print(eval(E))

gives

False

But

E = "(A > 0) and (B > 5 or C > 10)"
A = 1
print(eval(E))

gives the error "name 'B' is not defined".

like image 40
profbejian Avatar answered Oct 23 '22 14:10

profbejian


So from what I understand of your question, you want something like

if (A>0) {
  B = getB();
  C = getC();
  if (B>23 || C==11)
    explode();
}

ie your expression must be split up so that you are only ever working with known values.

like image 41
UKMonkey Avatar answered Oct 23 '22 16:10

UKMonkey


You can do it like this:

class LazyValues():

    def __init__(self):
        self._known_values = {}

    def __getitem__(self, var):
        try:
            return self._known_values[var]
        except KeyError:
            print("Evaluating %s..." % var)
            return self._known_values.setdefault(var, eval(var))


def lazy_eval(expr, lazy_vars):
    for var in lazy_vars:
        expr  = expr.replace(var, "lazy_values['%s']" % var)
        # will look like ((lazy_value['A']>0) && (lazy_value['B']>5 || lazy_value['C'] > 10))

    lazy_values = LazyValues()
    return eval(expr)


lazy_eval("((A>0) and (B>5 or C > 10))", lazy_vars=['A', 'B', 'C'])

# Evaluating A...
# ....
# NameError: name 'A' is not defined

A = -1
lazy_eval("((A>0) and (B>5 or C > 10))", lazy_vars=['A', 'B', 'C'])

#Evaluating A...
#False

A = 5
B = 6
lazy_eval("((A>0) and (B>5 or C > 10))", lazy_vars=['A', 'B', 'C'])

# Evaluating A...
# Evaluating B...
# True

More details later...

like image 22
Thierry Lathuille Avatar answered Oct 23 '22 16:10

Thierry Lathuille


It seems to me that the answer is yes, yes you can try to evaluate the expression with missing information. You will need to define what happens though when a symbol lookup fails.

In you case, you will need a boolean expression evaluator and a symbol table so the evaluator can lookup the symbols to execute the expression.

If you succeed in looking up all the symbols, the result would be true or false. If you fail to look up a symbol, then handle that case, perhaps returning None, nullptr, or raising / throwing an exception.

I believe you can embed the python interpreter in your c++ program and call a function to evaluate the expression, more importantly, you can give it the dict to use as the symbol table. If the call returns a result, it was able to find enough symbols or shortcut to a result, otherwise it will raise an exception which your c++ code can detect.

You can prototype the function in python the evaluate if the approach works they way you want, then embed.

Or you can do it all in c++, with a grammar, lexer, parser, and elevator.

like image 35
Nick Avatar answered Oct 23 '22 15:10

Nick


While this is a very crude implementation of your solution, but it suits your situation perfectly although using a lot of if else and Exception Handling.

def main_func():
    def checker(a, b=None, c=None):
        if a is None:
            del a
        if b is None:
            del b
        if c is None:
            del c
        d = eval('a>0 and (b>5 or c>10)')
        return d
    return checker

def doer(a=None, b=None, c=None):
    try:
        return main_func()(a,b,c)
    except NameError as e:
        if "'a' is not" in str(e):
            return 'a'
        elif "'b' is not" in str(e):
            return 'b'
        elif "'c' is not" in str(e):
            return 'c'

def check_ret(ret):
    return type(ret) == bool

def actual_evaluator():
    getter = {
        "a": get_a,
        "b": get_b,
        "c": get_c
    }
    args = []
    while True:
        ret = doer(*tuple(args))
        if not check_ret(ret):
            val = getter[ret]()
            args.append(val)
        else:
            return ret

if __name__ == '__main__':
        print actual_evaluator()

Now explaining my code, main_func returns another function which is used to evaluate the given expression in a string. While here the string has been hardcoded, you can always pass it as a parameter to the function and replace the string inside eval with the parameter.

In doer, the function returned by main_func is invoked and if a NameError is thrown, which happens in case of the previous conditions being false and new values to be calculated then it returns the specific variable which needs to be calculated. All of that is checked in the actual_evaluator where the values of the variables are fetched via some function get_variable_name which you can define in your getter dict. In my code, I'd used random numbers to check the validity, but like you said you'd have to evaluate the various variables by other means so you can call the respective functions.

like image 39
Indranil Dutta Avatar answered Oct 23 '22 14:10

Indranil Dutta


Due to the short-circuit behaviour, Python can evaluate an expression even without all the contained values defined if it's possible. If not, it raises an exception:

In [1]: a= False

In [3]: a and b
Out[3]: False

In [4]: a or b
NameError: name 'b' is not defined

But the expression is evaluated left to right:

In [5]: b and a
NameError: name 'b' is not defined
like image 23
ivan_pozdeev Avatar answered Oct 23 '22 14:10

ivan_pozdeev


I have done a "roll-my-own" approach for this in the past. It's not that difficult for simple things; you simply create your own objects that implement magic math methods and that keep track of other objects.

If you need something more full-featured, the sympy project is designed to do symbolic math...

like image 31
Patrick Maupin Avatar answered Oct 23 '22 16:10

Patrick Maupin


I'd take a look at sympy or other computer algebra systems. I believe that algebraic simplification of the pxeression plus short circuit evaluation will allow you to evaluate all the cases where it is possible to get a result. There are some cases where you need to know the value of some variable. For example if you have a simple expression like a == b, you are going to make no progress without knowing the value of a and b. However something like (a >= 0) ||(a <= 0), algebraic simplification will result in true assuming that a is not NAN or some other value that is not equal to itself.

like image 43
Sam Hartman Avatar answered Oct 23 '22 15:10

Sam Hartman


It sounds like you have two challenges:

  1. It is expensive to calculate some variable values, so you want to avoid calculating values that aren't needed to evaluate the expression; and
  2. Your expression exists as a string, composed at runtime, so you can't use C++'s built-in short-circuiting logic.

This means you need some way to evaluate an expression at runtime, and you would like to take advantage of short-circuit logic if possible. Python could be a good choice for this, as shown in the example below.

There is a short Python script (evaluate.py), which defines an evaluate() function which can be called from your C or C++ program. The evaluate() function will attempt to evaluate the expression you give it (translating "&&" and "||" to "and" and "or" if needed). If it requires a variable which hasn't been defined yet, it will retrieve a value for that variable by calling a get_var_value() function defined in the C/C++ program (and then cache the value for later use).

This approach will use normal short-circuit behavior, so it will only request the variable values needed to finish evaluating the expression. Note that this won't rearrange the expression to choose the minimal set of variables needed to evaluate it; it just uses the standard short-circuiting behavior.

UPDATE: I've added an example at the end that defines the Python script using a multiline string literal in the .cpp file. This could be useful if you don't want to install a separate evaluate.py file along with your executable. It also simplifies the Python initialization a little bit.

The C/Python interaction in the scripts below is based on code in https://docs.python.org/2/extending/embedding.html and https://docs.python.org/2/c-api/arg.html.

Here are the files:

evaluate.py (Python script)

# load embedded_methods module defined by the parent C program
from embedded_methods import get_var_value

# define a custom dictionary class that calls get_var_value(key) for any missing keys.
class var_dict(dict):
    def __missing__(self, var):
        self[var] = val = get_var_value(var)
        return val

# define a function which can be called by the parent C program
def evaluate(expr):
    # Create a dictionary to use as a namespace for the evaluation (this version 
    # will automatically request missing variables).
    # Move this line up to the module level to retain values between calls.
    namespace = var_dict()

    # convert C-style Boolean operators to Python-style
    py_expr = expr.replace("||", " or ").replace("&&", " and ").replace("  ", " ")

    print('evaluating expression "{}" as "{}"'.format(expr, py_expr))

    # evaluate the expression, retrieving variable values as needed
    return eval(py_expr, namespace)

evaluate.c (your main program; could also be evaluate.cpp, compiled with g++)

// on Mac, compile with gcc -o evaluate evaluate.c -framework Python
#include <Python/Python.h>  // Mac
// #include <Python.h> // non-Mac?

// retain values of argc and argv for equation evaluation
int argc;
char **argv;

/* 
   Calculate the value of a named variable; this is called from the Python 
   script to obtain any values needed to evaluate the expression. 
*/
static PyObject* c_get_var_value(PyObject *self, PyObject *args)
{
    int var_num;
    char *var_name;
    char err_string[100];
    long var_value;
    if(!PyArg_ParseTuple(args, "s:get_var_value", &var_name)) {
        PyErr_SetString(PyExc_ValueError, "Invalid arguments passed to get_var_value()");
        return NULL;
    }
    // change the code below to define your variable values
    // This version just assumes A, B, C are given by argv[2], argv[3], argv[4], etc.
    printf("looking up value of %s: ", var_name);
    var_num = var_name[0]-'A';
    if (strlen(var_name) != 1 || var_num < 0 || var_num >= argc-2) {
        printf("%s\n", "unknown");
        snprintf(
            err_string, sizeof(err_string), 
            "Value requested for unknown variable \"%s\"", var_name
        );
        PyErr_SetString(PyExc_ValueError, err_string);
        return NULL;  // will raise exception in Python
    } else {
        var_value = atoi(argv[2+var_num]);
        printf("%ld\n", var_value);
        return Py_BuildValue("l", var_value);
    }
}

// list of methods to be added to the "embedded_methods" module
static PyMethodDef c_methods[] = {
    {"get_var_value", c_get_var_value, METH_VARARGS, // could use METH_O
     "Retrieve the value for the specified variable."},
    {NULL, NULL, 0, NULL} // sentinel for end of list
};

int main(int ac, char *av[])
{
    PyObject *p_module, *p_evaluate, *p_args, *p_result;
    long result;
    const char* expr;

    // cache and evaluate arguments
    argc = ac;
    argv = av;
    if (argc < 2) {
        fprintf(
            stderr, 
            "Usage: %s \"expr\" A B C ...\n"
            "e.g.,  %s \"((A>0) && (B>5 || C > 10))\" 10 9 -1\n", 
            argv[0], argv[0]
        );
        return 1;
    }
    expr = argv[1];

    // initialize Python
    Py_SetProgramName(argv[0]);
    Py_Initialize();
    // Set system path to include the directory where this executable is stored
    // (to find evaluate.py later)
    PySys_SetArgv(argc, argv);

    // attach custom module with get_var_value() function
    Py_InitModule("embedded_methods", c_methods);

    // Load evaluate.py
    p_module = PyImport_ImportModule("evaluate");
    if (PyErr_Occurred()) { PyErr_Print(); }
    if (p_module == NULL) {
        fprintf(stderr, "unable to load evaluate.py\n");
        return 1;
    }

    // get a reference to the evaluate() function
    p_evaluate = PyObject_GetAttrString(p_module, "evaluate");
    if (!(p_evaluate && PyCallable_Check(p_evaluate))) {
        fprintf(stderr, "Cannot retrieve evaluate() function from evaluate.py module\n");
        return 1;
    }

     /*
        Call the Python evaluate() function with the expression to be evaluated.
        The evaluate() function will call c_get_var_value() to obtain any
        variable values needed to evaluate the expression. It will use 
        caching and normal logical short-circuiting to reduce the number 
        of requests.
     */
    p_args = Py_BuildValue("(s)", expr);
    p_result = PyObject_CallObject(p_evaluate, p_args);
    Py_DECREF(p_args);
    if (PyErr_Occurred()) {
        PyErr_Print();
        return 1;
    }
    result = PyInt_AsLong(p_result);
    Py_DECREF(p_result);

    printf("result was %ld\n", result);

    Py_DECREF(p_evaluate);
    Py_DECREF(p_module);
    return 0;
}

Results:

$ evaluate "((A>0) && (B>5 || C > 10))" -1 9 -1
evaluating expression "((A>0) && (B>5 || C > 10))" as "((A>0) and (B>5 or C > 10))"
looking up value of A: -1
result was 0

$ evaluate "((A>0) && (B>5 || C > 10))" 10 9 -1
evaluating expression "((A>0) && (B>5 || C > 10))" as "((A>0) and (B>5 or C > 10))"
looking up value of A: 10
looking up value of B: 9
result was 1

$ evaluate "((A>0) && (B>5 || C > 10))" 10 3 -1
evaluating expression "((A>0) && (B>5 || C > 10))" as "((A>0) and (B>5 or C > 10))"
looking up value of A: 10
looking up value of B: 3
looking up value of C: -1
result was 0

As an alternative, you can combine all this code into a single .cpp file, as shown below. This uses the multi-line string literal capability in C++11.

Self-contained evaluate.cpp

// on Mac, compile with g++ evaluate.cpp -o evaluate -std=c++11 -framework Python
#include <Python/Python.h>  // Mac
//#include <Python.h> // non-Mac?

/* 
   Python script to be run in embedded interpreter.
   This defines an evaluate(expr) function which will interpret an expression
   and return the result. If any variable values are needed, it will call the
   get_var_values(var) function defined in the parent C++ program
*/
const char* py_script = R"(
# load embedded_methods module defined by the parent C program
from embedded_methods import get_var_value

# define a custom dictionary class that calls get_var_value(key) for any missing keys.
class var_dict(dict):
    def __missing__(self, var):
        self[var] = val = get_var_value(var)
        return val

# define a function which can be called by the parent C program
def evaluate(expr):
    # Create a dictionary to use as a namespace for the evaluation (this version 
    # will automatically request missing variables).
    # Move this line up to the module level to retain values between calls.
    namespace = var_dict()

    # convert C-style Boolean operators to Python-style
    py_expr = expr.replace("||", " or ").replace("&&", " and ").replace("  ", " ")

    print('evaluating expression "{}" as "{}"'.format(expr, py_expr))

    # evaluate the expression, retrieving variable values as needed
    return eval(py_expr, namespace)
)";

// retain values of argc and argv for equation evaluation
int argc;
char **argv;

/* 
   Calculate the value of a named variable; this is called from the Python 
   script to obtain any values needed to evaluate the expression. 
*/
static PyObject* c_get_var_value(PyObject *self, PyObject *args)
{
    int var_num;
    char *var_name;
    char err_string[100];
    long var_value;
    if(!PyArg_ParseTuple(args, "s:get_var_value", &var_name)) {
        PyErr_SetString(PyExc_ValueError, "Invalid arguments passed to get_var_value()");
        return NULL;
    }
    // change the code below to define your variable values
    // This version just assumes A, B, C are given by argv[2], argv[3], argv[4], etc.
    printf("looking up value of %s: ", var_name);
    var_num = var_name[0]-'A';
    if (strlen(var_name) != 1 || var_num < 0 || var_num >= argc-2) {
        printf("%s\n", "unknown");
        snprintf(
            err_string, sizeof(err_string), 
            "Value requested for unknown variable \"%s\"", var_name
        );
        PyErr_SetString(PyExc_ValueError, err_string);
        return NULL;  // will raise exception in Python
    } else {
        var_value = atoi(argv[2+var_num]);
        printf("%ld\n", var_value);
        return Py_BuildValue("l", var_value);
    }
}

// list of methods to be added to the "embedded_methods" module
static PyMethodDef c_methods[] = {
    {"get_var_value", c_get_var_value, METH_VARARGS, // could use METH_O
     "Retrieve the value for the specified variable."},
    {NULL, NULL, 0, NULL} // sentinel for end of list
};

int main(int ac, char *av[])
{
    PyObject *p_module, *p_evaluate, *p_args, *p_result;
    long result;
    const char* expr;

    // cache and evaluate arguments
    argc = ac;
    argv = av;
    if (argc < 2) {
        fprintf(
            stderr, 
            "Usage: %s \"expr\" A B C ...\n"
            "e.g.,  %s \"((A>0) && (B>5 || C > 10))\" 10 9 -1\n", 
            argv[0], argv[0]
        );
        return 1;
    }
    expr = argv[1];

    // initialize Python
    Py_SetProgramName(argv[0]);
    Py_Initialize();

    // attach custom module with get_var_value() function
    Py_InitModule("embedded_methods", c_methods);

    // run script to define evalute() function
    PyRun_SimpleString(py_script);
    if (PyErr_Occurred()) {
        PyErr_Print(); 
        fprintf(stderr, "%s\n", "unable to run Python script");
        return 1;
    }

    // get a reference to the Python evaluate() function (can be reused later)
    // (note: PyRun_SimpleString creates objects in the __main__ module)
    p_module = PyImport_AddModule("__main__");
    p_evaluate = PyObject_GetAttrString(p_module, "evaluate");
    if (!(p_evaluate && PyCallable_Check(p_evaluate))) {
        fprintf(stderr, "%s\n", "Cannot retrieve evaluate() function from __main__ module");
        return 1;
    }

    /*
       Call the Python evaluate() function with the expression to be evaluated.
       The evaluate() function will call c_get_var_value() to obtain any
       variable values needed to evaluate the expression. It will use 
       caching and normal logical short-circuiting to reduce the number 
       of requests.
    */
    p_args = Py_BuildValue("(s)", expr);
    p_result = PyObject_CallObject(p_evaluate, p_args);
    Py_DECREF(p_args);
    if (PyErr_Occurred()) {
        PyErr_Print();
        return 1;
    }
    result = PyInt_AsLong(p_result);
    Py_DECREF(p_result);

    printf("result was %ld\n", result);

    Py_DECREF(p_module);
    Py_DECREF(p_evaluate);
    return 0;
}
like image 38
Matthias Fripp Avatar answered Oct 23 '22 14:10

Matthias Fripp