Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse human-entered logical properties, e.g. this and (that or the other)?

Tags:

c#

.net

parsing

As part of an upcoming project, I'd like to set it up so that a certain domain object can apply to tags or combinations of tags.

I'd like to be able to have users enter these combinations in a human-readable way, similar to:

  • tag-a and (tag-b or tag-c) --> applies to tag-a+tag-b or tag-a+tag-c
  • tag-d or (tag-e and tag-f) --> applies to tag-d or tag-e+tag-f

Does a tool-set exist to do this kind of logic parsing from one string of entered text? I could define the tags behind the scenes with a certain distinction ({}, [], etc.) so that they can be parsed out more easily as well.

Just wondering what the best way is to parse that human-readable text into those distinct sets of combinations without the user having to enter each specific combination.

Thanks!

like image 630
SeanKilleen Avatar asked Sep 17 '12 18:09

SeanKilleen


1 Answers

Typically this involves two steps: lexing (short for lexical analysis) and parsing.

In the first step, the input string is transformed into a sequence of lexical items, called tokens. For this purpose, you might declare an enum type for the different types of token, e.g.:

public enum TokenType
{
    OpenParenthesis,
    CloseParenthesis,
    And,
    Or,
    Tag
}

and a class for tokens:

sealed class Token
{
    public TokenType Type { get; private set; }
    public string Item { get; private set; }
    public Token(TokenType type, string item) { Type = type; Item = item; }
}

Now you write an algorithm that turns the input string, e.g. tag-a and (tag-b or tag-c), into a sequence of Token instances. You can use regular expressions to recognise the various items, for example @"\s*\(\s*" would be the regular expression to recognise open-parentheses. The finished sequence would then look something like this:

  • new Token(TokenType.Tag, "tag-a")
  • new Token(TokenType.And, null)
  • new Token(TokenType.OpenParenthesis, null)
  • new Token(TokenType.Tag, "tag-b")
  • new Token(TokenType.Or, null)
  • new Token(TokenType.Tag, "tag-c")
  • new Token(TokenType.CloseParenthesis, null)

Once you have this sequence, you need to run a parser on it. There are many strategies for parsing expressions like these; for the beginning, I recommend to you the recursive descent parser.

You will, of course, need a few classes to contain the parse tree:

abstract class Node { }
enum BooleanOperator { And, Or }
sealed class BooleanNode : Node
{
    public BooleanOperator Operator { get; private set; }
    public Node Left { get; private set; }
    public Node Right { get; private set; }
    public BooleanNode(BooleanOperator op, Node left, Node right)
    {
        Operator = op;
        Left = left;
        Right = right;
    }
}
sealed class TagNode : Node
{
    public string Tag { get; private set; }
    public TagNode(string tag) { Tag = tag; }
}

And then a recursive descent parser might look something like this:

public static Node ParseExpression(Token[] tok)
{
    int i = 0;
    return parseExpressionBoolOr(tok, ref i);
}
private static Node parseExpressionBoolOr(Token[] tok, ref int i)
{
    var left = parseExpressionBoolAnd(tok, ref i);
    while (tok[i].Type == TokenType.Or)
    {
        i++;
        var right = parseExpressionBoolAnd(tok, ref i);
        left = new BooleanNode(BooleanOperator.Or, left, right);
    }
    return left;
}
private static Node parseExpressionBoolAnd(Token[] tok, ref int i)
{
    var left = parseExpressionPrimary(tok, ref i);
    while (tok[i].Type == TokenType.And)
    {
        i++;
        var right = parseExpressionPrimary(tok, ref i);
        left = new BooleanNode(BooleanOperator.And, left, right);
    }
    return left;
}
private static Node parseExpressionPrimary(Token[] tok, ref int i)
{
    if (tok[i].Type == TokenType.OpenParenthesis)
    {
        i++;
        var node = parseExpressionBoolOr(tok, ref i);
        if (tok[i].Type != TokenType.CloseParenthesis)
            throw new InvalidOperationException();  // or customised parse exception
        return node;
    }
    else if (tok[i].Type == TokenType.Tag)
    {
        var node = new TagNode(tok[i].Item);
        i++;
        return node;
    }
    else
        throw new InvalidOperationException();  // or customised parse exception
}

Note that this is a greatly simplified example. However, it is maximally flexible: you can extend this algorithm to parse absolutely any language you want.

like image 167
Timwi Avatar answered Oct 30 '22 01:10

Timwi