Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP regexp - Detect unclosed brackets

I need to detect if a string contains any unclosed angle brackets.

I tried to avoid regular expression by comparison number of left and right brackets:

if (substr_count($string, '<') !== substr_count($string, '>'))
{
    // Text contains unclosed angle brackets           
}

But this method will not detect mistake like this:

This is >b<BOLD>/b< word
like image 753
Koralek M. Avatar asked Jun 01 '16 17:06

Koralek M.


4 Answers

I would not recommend using regular expressions for a task like this.
A simple function to check a string for properly written brackets is quickly written:

/**
* @param $str input string
* @returns boolean true if all brackets are properly opened and closed, false otherwise
*/
function checkBraces($str)
{
    $strlen = strlen($str); // cache string length for performance
    $openbraces = 0;

    for ($i = 0; $i < $strlen; $i++)
    {
        $c = $str[$i];
        if ($c == '<') // count opening bracket
            $openbraces++;
        if ($c == '>') // count closing bracket
            $openbraces--;

        if ($openbraces < 0) // check for unopened closing brackets
            return false;
    }

    return $openbraces == 0; // check for unclosed open brackets
}

Using this code as a basis, it shouldn't be too hard to implement a check to verify whether or not the tag name of opening and closing brackets also matches - but I'll leave that to you :-)

like image 147
Cobra_Fast Avatar answered Sep 28 '22 15:09

Cobra_Fast


But this method will not detect mistake like this:

Because counting makes sense only if you want to check if there's equal number of opening and closing brackets. But if you want to be kind to your user and point to the place he made a mistake, then counting will not be sufficient and you should use i.e. the stack (even array based stack based on array_push() and array_pop() would suffice). With stack you iterate over your string and push a token when you encounter opening bracket < and pop a token when you hit closing one >. In your case:

This is >b<BOLD>/b< word

you would have to do pop as first is > but there's nothing on stack so this triggers error. Let's fix that bracket and continue:

This is <b<BOLD>/b< word

and run

push -> ok
push -> well if you allow nested brackets, then all is ok, otherwise 
        stack must be empty prior pushing so this bracket is misplaced
        and you shall throw an error

and so on... and once you reach end of the string and your stack is not empty then you know last spotted < misses its > pair (If you allow bracket nesting, then logic needed to tell which one is potentially not closed may be more complicated and sometimes give false results (as compilers sometimes do in similar case for example)).

If you do not plan to allow nested brackets, then you can make your code even simpler as using plain integer variable to indicate the state would suffice (i.e. '0' for <, 1 for > and -1 for initial state)

like image 30
Marcin Orlowski Avatar answered Sep 28 '22 15:09

Marcin Orlowski


There is a PCRE regex to check for a correct number of balanced angle brackets:

'~\A[^<>]*+(<(?>[^<>]|(?1))*+>[^<>]*+)++\z~'

See the regex demo

See details at the Matching Balanced Constructs page at regular-expressions.info.

In short:

  • \A - start of string
  • [^<>]*+ - zero or more characters other than < and >
  • (<(?>[^<>]|(?1))*+>[^<>]*+)++ - 1 or more occurrences of
    • < - opening < bracket
    • (?>[^<>]|(?1))*+ - 0 or more occurrences of any char other than < and > (see [^<>]) or the whole Group 1 subpattern (the subroutine call (?1))
    • > - closing > bracket
    • [^<>]*+ - zero or more characters other than < and >
  • \z - end of string.
like image 26
Wiktor Stribiżew Avatar answered Sep 28 '22 14:09

Wiktor Stribiżew


Loop through the string one character at a time, if the the character is a "<" increment a counter, and if it is ">" decrement the counter. If the counter ever becomes negative or the counter is not zero when you get through the string, then you have unclosed brackets.

like image 38
Schleis Avatar answered Sep 28 '22 16:09

Schleis