Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse expression (with custom functions and operations)

I have a string, which contains a custom expression, I have to parse and evaluate:

For example:

(FUNCTION_A(5,4,5) UNION FUNCTION_B(3,3)) 
INTERSECT (FUNCTION_C(5,4,5) UNION FUNCTION_D(3,3))

FUNCTION_X represent functions, which are implemented in C# and return ILists. UNION or INTERSECT are custom functions which should be applied to the lists, which are returned from those functions.

Union and intersect are implemented via Enumerable.Intersect/Enumerable.Union.

How can the parsing and evaluating be implemented in an elegant and expandable manner?

like image 871
red Avatar asked Oct 18 '12 15:10

red


1 Answers

It depends on how complex your expressions will become, how many different operators are going to be available, and a whole number of different variables. Whichever way you do it, you will probably need to first determine a grammar for your mini-language.

For simple grammars, you can just write a custom parser. In the case of many calculators and similar applications, a recursive descent parser is expressive enough to handle the grammar and is intuitive to write. The linked Wikipedia page gives a sample grammar and the implementation of a C parser for it. Eric White also has a blog post on building recursive descent parsers in C#.

For more complex grammars, you will likely want to skip the work of creating this yourself and use a lex/yacc-type lexer and parser toolset. Normally you give as input to these a grammar in EBNF or similar syntax, and they will produce the code necessary to parse the input for you. The parser will typically return a syntax tree which you can traverse, allowing you to apply logic for each token in the input stream (each node in the tree). For C#, I have worked with GPLex and GPPG, but others such as ANTLR are also available.

Basic Parsing Concepts

In general, you want to be able to split each item in the input into a meaningful token, and build a tree based on those tokens. Once the tree is built, you can traverse the tree and perform the necessary action at each node. A syntax tree for FUNCTION_A(5,4,5) UNION FUNCTION_B(3,3) might look like this, where the node types are in capital letters and their values are in parenthesis:

                        PROGRAM
                           |
                           |
                         UNION
                           |
            ------------------------------
           |                              |
  FUNCTION (FUNCTION_A)          FUNCTION(FUNCTION_B)
        |                                 |
  -------------                       ----------
 |      |      |                     |          |
INT(5) INT(4) INT(5)                INT(3)     INT(3)

The parser needs to be smart enough to know that when a UNION is found, it needs to be supplied with two items to union, etc. Given this tree, you would start at the root (PROGRAM) and do a depth-first traversal. At the UNION node, the action would be to first visit all children, and then union the results together. At a FUNCTION node, the action would be to first visit all of the children, find their values, and use those values as parameters to the function, and secondly to evaluate the function on those inputs and return the value.

This would continue for all tokens, for any expression you can come up with. In this way, if you spend the time to get the parser to produce the right tree and each node knows how to perform whatever action it needs to, your design is very extensible and can handle any input that matches the grammar it was designed for.

like image 147
goric Avatar answered Nov 03 '22 02:11

goric