Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good Design For Regex, Capture Groups And Unit Testing

In a project I'm experimenting with using regular expressions to distinguish between various types of sentences and map them to functions to handle these sentences.

Most of these sentence handling functions take arguments from the sentence itself, parsed out by capture groups in the regular expression.

Ex: "I paid $20 for 2 cookies" is matched by one of the regular expressions in my parse tree (a dictionary). The regex would match extract $20 as the group "price", and 2 as group "amount". Currently I am mapping to the correct Handler function and calling it as follows:

foreach(KeyValuePair<Regex, Type> pair in sentenceTypes)
{
    Match match = pair.Key.Match(text);
    if(match.Success)
    {
        IHandler handler = handlerFactory.CreateHandler(pair.Value);
        output = handler.Handle(match);
    }
}

Example of a simple handler class.

public class NoteCookiePriceHandler
    {
        public string Handle(Match match)
        {
            double payment = Convert.ToDouble(match.Result("${payment}"));
            int amount = Convert.ToInt32(match.Result("${amount}"));

            double price = payment / amount;
            return "The price is $" + price;
        }
    }

I was trying to set up some unit tests with Moq to help out, when I realized I can't actually mock a Match object, nor a Regex. Thinking about it more the design seems somewhat flawed in general, as I am depending on named groups being correctly parsed and handed to the Handler class without a good interface.

I am looking for suggestions on a more effective design to use in passing parameters correctly to a mapped handler function/class, as passing the Match object seems problematic.

Failing that, Any help in figuring out a way to mock Regex or Match effectively would be appreciated, and at least help me solve my short term problem. They both lack default constructors, and so I am having a hard time getting Moq to create objects of them.

Edit: I ended up solving at least the mocking problem by passing a dictionary of strings for my match groups, rather than the (un-Moq-able) match object itself. I'm not particularly happy with this solution, so recommendations would still be appreciated.

foreach(KeyValuePair<Regex, Type> pair in sentenceTypes)
        {
            match = pair.Key.Match(text);
            if(match.Success)
            {
                IHandler handler= handlerFactory.CreateHandler(pair.Value);
                foreach (string groupName in pair.Key.GetGroupNames())
                {
                    matchGroups.Add(groupName, match.Groups[groupName].Value);
                }
                interpretation = handler.Handle(matchGroups);
like image 821
glockman Avatar asked Nov 13 '22 05:11

glockman


1 Answers

One way to avoid bad design is to start with the principles of good design instead of simply the problem you wish to resolve. This is one of the reasons why test driven development is so powerful in transforming the quality of code. This way of thinking did exist way before TDD though under the name: design by contract. Allow me to demonstrate:

What would you like the ideal handler to look like? How about this:

interface IHandler {
    String handle();
}

Implementation:

public class NoteCookiePriceHandler : IHandler
{  
    private double payment;
    private int amount;

    public NoteCookiePriceHandler(double payment, int amount) {
        this.payment = payment;
        this.amount = amount;
    }

    public String handle() {
        return "The price is $" + payment / amount;
    }
}

Now starting with this ideal design, perhaps with the tests for this design. How can we get the sentence input of the sentences to be sent to the handlers? Well, all problem in computer science can be solved with another layer of indirection. Let's say the sentence parser does not create the handler directly, but uses a factory to create one:

interface HandlerFactory<T> where T: IHandler  {
    T getHandler(KeyValuePair<String, String> captureGroups);
}

You could then create one factory per handler, but soon enough you would find a way to create a generic factory. Using reflection for example you could match the capture group name to the constructor parameters. Based upon the data types of the constructor parameters you could automatically let your generic handler factory convert your strings to the correct data types. This would all be easily testable by creating some fake handlers and asking the factory to populate them using some key value pair string inputs.

like image 153
Lodewijk Bogaards Avatar answered Nov 14 '22 23:11

Lodewijk Bogaards