I need to detect if a string contains any unclosed angle brackets.
I tried to avoid regular expression by comparison number of left and right brackets:
if (substr_count($string, '<') !== substr_count($string, '>'))
{
// Text contains unclosed angle brackets
}
But this method will not detect mistake like this:
This is >b<BOLD>/b< word
I would not recommend using regular expressions for a task like this.
A simple function to check a string for properly written brackets is quickly written:
/**
* @param $str input string
* @returns boolean true if all brackets are properly opened and closed, false otherwise
*/
function checkBraces($str)
{
$strlen = strlen($str); // cache string length for performance
$openbraces = 0;
for ($i = 0; $i < $strlen; $i++)
{
$c = $str[$i];
if ($c == '<') // count opening bracket
$openbraces++;
if ($c == '>') // count closing bracket
$openbraces--;
if ($openbraces < 0) // check for unopened closing brackets
return false;
}
return $openbraces == 0; // check for unclosed open brackets
}
Using this code as a basis, it shouldn't be too hard to implement a check to verify whether or not the tag name of opening and closing brackets also matches - but I'll leave that to you :-)
But this method will not detect mistake like this:
Because counting makes sense only if you want to check if there's equal number of opening and closing brackets. But if you want to be kind to your user and point to the place he made a mistake, then counting will not be sufficient and you should use i.e. the stack (even array based stack based on array_push()
and array_pop()
would suffice). With stack you iterate over your string and push a token when you encounter opening bracket <
and pop a token when you hit closing one >
. In your case:
This is >b<BOLD>/b< word
you would have to do pop
as first is >
but there's nothing on stack so this triggers error. Let's fix that bracket and continue:
This is <b<BOLD>/b< word
and run
push -> ok
push -> well if you allow nested brackets, then all is ok, otherwise
stack must be empty prior pushing so this bracket is misplaced
and you shall throw an error
and so on... and once you reach end of the string and your stack is not empty then you know last spotted <
misses its >
pair (If you allow bracket nesting, then logic needed to tell which one is potentially not closed may be more complicated and sometimes give false results (as compilers sometimes do in similar case for example)).
If you do not plan to allow nested brackets, then you can make your code even simpler as using plain integer
variable to indicate the state would suffice (i.e. '0' for <
, 1
for >
and -1
for initial state)
There is a PCRE regex to check for a correct number of balanced angle brackets:
'~\A[^<>]*+(<(?>[^<>]|(?1))*+>[^<>]*+)++\z~'
See the regex demo
See details at the Matching Balanced Constructs page at regular-expressions.info.
In short:
\A
- start of string[^<>]*+
- zero or more characters other than <
and >
(<(?>[^<>]|(?1))*+>[^<>]*+)++
- 1 or more occurrences of
<
- opening <
bracket(?>[^<>]|(?1))*+
- 0 or more occurrences of any char other than <
and >
(see [^<>]
) or the whole Group 1 subpattern (the subroutine call (?1)
)>
- closing >
bracket[^<>]*+
- zero or more characters other than <
and >
\z
- end of string.Loop through the string one character at a time, if the the character is a "<" increment a counter, and if it is ">" decrement the counter. If the counter ever becomes negative or the counter is not zero when you get through the string, then you have unclosed brackets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With