Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need to create regular expression in Javascript to check the valid conditional string

I want to create the regular expression in javascript which will check the valid conditional string like

-1 OR (1 AND 2) AND 1

-1 OR (1 AND 2)

-1 OR 2

-1 OR 1 OR 1

-1 AND 1 AND 1

The string should not contain 'AND' and 'OR'. For example - 1 OR 2 AND 3 is invalid. -It should be (1 OR 2) AND 3 or 1 or (2 AND 3).

I tried the following Regex. It works for most of the conditions but it is failing to check the above condition.

/^(\s*\(\d+\s(AND|OR)\s\d+\)|\s*\d+)((\s*(AND|OR)\s*)(\(\d+\s(AND|OR)\s\d+\)|\s*\d+))*$/

Can anyone please help me to sort out the above problem.

like image 276
user1796078 Avatar asked Nov 03 '12 10:11

user1796078


People also ask

How do you check if a string is a valid regular expression?

fullmatch(). This method checks if the whole string matches the regular expression pattern or not. If it does then it returns 1, otherwise a 0.

Is there a regular expression to detect a valid regular expression?

No, if you are strictly speaking about regular expressions and not including some regular expression implementations that are actually context free grammars. There is one limitation of regular expressions which makes it impossible to write a regex that matches all and only regexes.


2 Answers

Forget about regular expressions, they can't do it.

Parser generators to the rescue

With a parser generator, you can create grammar that is both understandable and maintainable.

Here is a parser generator for JavaScript with an online demo.

Grammar

From what I have understood, you want don't want any implicit precedence rules between AND and OR.

Here is an example of what it considers valid:

-1 OR 2 OR (2 AND 2 AND (2 OR (6 AND -2 AND (6 OR 2) AND (6 OR 2)) OR 2 OR 2))

At the moment, the grammar requires/supports

  • "infinite" nesting
  • explicit precedence control with parentheses for AND/OR
  • (multiple) negation of literals
  • whitespace between operands and operators

The grammar can easily be changed to

  • allow arbitrary whitespace
  • optional negation of literals instead of possible multiple negations
  • negation of any subexpression

If you want a more in-depth explanation or can't figure out how to tweak it exactly to your liking, just drop a comment.

Here is your grammar, Simply paste it into the online generator and click Download parser.

start
  = formula

formula
 = ors
 / ands
 / literal
 / parens_formula

parens_formula
 = "(" formula ")"

ors
 = operand (whitespace "OR" whitespace  operand)+

ands
 =  operand (whitespace "AND" whitespace operand)+

whitespace
 = " "+

operand
 = literal
 / parens_formula

literal
 = integer
 / "-" literal

integer "integer"
  = digits:[0-9]+ { return parseInt(digits.join(""), 10); }
like image 128
phant0m Avatar answered Oct 23 '22 06:10

phant0m


Interesting question. And phant0m's answer is very educational! (and should be used if you understand parsers).

If you want to do this using just regex, the following solution correctly validates an arbitrarily nested logical statement using JavaScript.

Rules/Assumptions:

  • A valid statement consists of only numbers, parentheses, spaces, the AND logical operator and the OR logical operator.
  • The statement must contain at least two "tokens", separated by a logical operator, where each token is either a "number" or a "parenthesized unit".
  • A "number" token is a numeric integer having one or more decimal digits immediately preceded with an optional sign (either + or -).
  • A "parenthesized unit" token is two or more tokens, separated by a logical operator, enclosed within matching open and close parentheses.
  • The statement as a whole may contain more than two tokens but all tokens must be separated by the same, single operator; either AND or OR.
  • Each parenthesized unit may contain more than two tokens but all tokens must be separated by the same, single operator; either AND or OR.
  • Any amount of whitespace may be used between any of the elements (parentheses, numbers and logical operators) but at least one space is required between the numbers and the logical operator.
  • The AND and OR logical operators are not case sensitive.

Examples of valid logical statements:

"1 AND 2"
"1 AND 2 AND 3"
"1 OR 2"
"-10 AND -20"
"100 AND +200 AND -300"
"1 AND (2 OR 3)"
"1 AND (2 OR 3) AND 4"
"1 OR ((2 AND 3 AND 4) OR (5 OR 6 OR 7))"
"( 1 and 2 ) AND (1 AND 2)"

Examples of invalid logical statements:

"1x"            // Invalid character.
"1 AND"         // Missing token.
"1 AND 2 OR 3"  // Mixed logical operators.
"(1"            // Unbalanced parens.
"(((1 AND 2)))" // Too many parens.
"(1 AND) (2)"   // Missing token.
"1"             // Missing logical operator and second number
"1OR2OR3OR4"    // Missing spaces between numbers and operators.
"(1) AND (2)"   // Invalid parentheses.

A Regex Solution:

This problem requires matching nested parenthesized structures and the JavaScript regex engine does not support recursive expressions, therefore this problem cannot be solved in one whack using a single regex. However, the problem can be simplified into two parts each of which can be solved using a single JavaScript regex. The first regex matches innermost parenthesized units and the second validates a simplified logical statement (which has no parentheses).

Regex #1: Match innermost parenthesized unit.

The following regex matches one parenthesized unit, which consists of two or more number tokens, where numbers are all separated by either AND or OR with at least one space between the numbers and the logical operators. The regex is fully commented and formatted for easy readability in PHP free-spacing mode syntax:

$re_paren = '/
    # Match innermost "parenthesized unit".
    \(            # Start of innermost paren group.
    \s*           # Optional whitespace.
    [+-]?\d+      # First number token (required).
    (?:           # ANDs or ORs (required).
      (?:         # Either multiple AND separated values.
        \s+       # Required whitespace.
        AND       # Logical operator.
        \s+       # Required whitespace.
        [+-]?\d+  # Additional number.
      )+          # multiple AND separated values.
    | (?:         # Or multiple OR separated values.
        \s+       # Required whitespace.
        OR        # Logical operator.
        \s+       # Required whitespace.
        [+-]?\d+  # Additional number token.
      )+          # multiple OR separated values.
    )             # ANDs or ORs (required).
    \s*           # Optional whitespace.
    \)            # End of innermost paren group.
    /ix';

Regex #2: Validate simplified logical statement.

Here is a (nearly identical except for the boundary anchors) regular expression which validates a simplified logical statement (having only numbers and logical operators and no parentheses). Here it is in commented, free-spacing mode (PHP) syntax:

$re_valid = '/
    # Validate simple logical statement (no parens).
    ^             # Anchor to start of string.
    \s*           # Optional whitespace.
    [+-]?\d+      # First number token (required).
    (?:           # ANDs or ORs (required).
      (?:         # Either multiple AND separated values.
        \s+       # Required whitespace.
        AND       # Logical operator.
        \s+       # Required whitespace.
        [+-]?\d+  # Additional number.
      )+          # multiple AND separated values.
    | (?:         # Or multiple OR separated values.
        \s+       # Required whitespace.
        OR        # Logical operator.
        \s+       # Required whitespace.
        [+-]?\d+  # Additional number token.
      )+          # multiple OR separated values.
    )             # ANDs or ORs (required).
    \s*           # Optional whitespace.
    $             # Anchor to end of string.
    /ix';

Note that these two regexes are identical except for the boundary anchors.

A JavaScript Solution:

The following tested JavaScript function uses the above two regexes to solve the problem:

function isValidLogicalStatement(text) {
    var re_paren = /\(\s*[+-]?\d+(?:(?:\s+AND\s+[+-]?\d+)+|(?:\s+OR\s+[+-]?\d+)+)\s*\)/ig;
    var re_valid =  /^\s*[+-]?\d+(?:(?:\s+AND\s+[+-]?\d+)+|(?:\s+OR\s+[+-]?\d+)+)\s*$/ig;
    // Iterate from the inside out.
    while (text.search(re_paren) !== -1) {
        // Replace innermost parenthesized units with integer.
        text = text.replace(re_paren, "0");
    }
    if (text.search(re_valid) === 0) return true;
    return false;
}

The function uses an iterative technique to first match and replace the innermost parenthesized units, replacing each with a single number token, then check to see if the resulting statement (sans parentheses), is valid.

Addendum: 2012-11-06

In a comment to this answer, the OP now says that there must be spaces between the numbers and operators and that a number or parenthesized unit may NOT stand on its own. With these additional requirements in mind, I've updated the answer above.

like image 22
ridgerunner Avatar answered Oct 23 '22 04:10

ridgerunner