Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mathematical Set Validation with regular-expression

Tags:

java

regex

I need to validate a user given String and validate that it is a valid Set, possibly a set that contains inner sets. Examples:

 1) {1, 2, 3, 4} = valid
 2) {1, 2, {3, 4}, 5} = valid
 3) 1, 2, 3, 4 = invalid (missing brackets)
 4) {1, 2, {3, 4, 5} = invalid (missing inner bracket)

This is the regex I am using (broken up for readability):

String elementSeparator = "(,\\s)?";
String validElement = "(\\{?[A-Za-z0-9]*\\}?" + elementSeparator + ")*";
String regex = "^\\{" + validElement + "\\}$";

Currently the it accepts Sets with optional opening and closing brackets, but I need it to only accept if they are both there, and not if an inner set is missing a bracket. In my current implementation the 4th example is accepted as a valid set.

How can I accomplish this?

like image 237
unmuse Avatar asked Oct 31 '12 18:10

unmuse


1 Answers

Here's some Java pseudo-code for how to approach this problem without using any heavyweight tools such as ANTLR. The basic approach is to split the input into tokens consisting of

  1. A single open or close brace
  2. A comma
  3. Whitespace
  4. An identifier

Then you scan through the tokens, keeping track of the nesting level as you go. If when you get to the end the nesting level isn't zero, the input string has an unbalanced brace.

Pattern token = Pattern.compile("([{}]|,|[A-Aa-z0-9]+|\s+)");
int nesting = 0
Matcher m = token.matcher(inputString);
while(m.find())
{
    if (m.group(1).equals("{")
        nesting++;
    else if (m.group(1).equals("}")
    {
        nesting--;
        if (nesting < 0)
            error - too many right braces
    }
    else
        ....
}
if (nesting != 0) 
    log("incorrect nesting");

Once you have this framework in place you can enhance it to detect things like two commas in a row: set a flag when you see a comma, clear the flag when you see an identifier (but not whitespace). In the branch for comma and close brace you test the flag and issue an error message since a comma at that point is not valid. And so on, for whatever validation you need.

Note that my pseudocode above is not a complete solution, just intended to give you the general approach. A complete solution would be somewhat more involved as it would have to deal with invalid characters, making the lexer (the part that breaks up the string into tokens) more complex.

like image 155
Jim Garrison Avatar answered Nov 03 '22 15:11

Jim Garrison