I'm trying to match text like sp { ...{...}... }
, where the curly braces are allowed to nest. This is what I have so far:
my $regex = qr/
( #save $1
sp\s+ #start Soar production
( #save $2
\{ #opening brace
[^{}]* #anything but braces
\} #closing brace
| (?1) #or nested braces
)+ #0 or more
)
/x;
I just cannot get it to match the following text: sp { { word } }
. Can anyone see what is wrong with my regex?
The square brackets match any one of characters inside the brackets. A range of characters in the alphabet can be matched using the hyphen. For example, "/[xyz]/ "will match any of "x", "y", or "z".
Integer values enclosed in {} indicate the number of times to apply the preceding regular expression. n is the minimum number, and u is the maximum number. If you specify only n, it indicates the exact number of times to apply the regular expression.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
Definition and Usage The \f metacharacter matches form feed characters.
There are numerous problems. The recursive bit should be:
(
(?: \{ (?-1) \}
| [^{}]+
)*
)
All together:
my $regex = qr/
sp\s+
\{
(
(?: \{ (?-1) \}
| [^{}]++
)*
)
\}
/x;
print "$1\n" if 'sp { { word } }' =~ /($regex)/;
This is case for the underused Text::Balanced
, a very handy core module for this kind of thing. It does rely on the pos
of the start of the delimited sequence being found/set first, so I typically invoke it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::Balanced 'extract_bracketed';
sub get_bracketed {
my $str = shift;
# seek to beginning of bracket
return undef unless $str =~ /(sp\s+)(?={)/gc;
# store the prefix
my $prefix = $1;
# get everything from the start brace to the matching end brace
my ($bracketed) = extract_bracketed( $str, '{}');
# no closing brace found
return undef unless $bracketed;
# return the whole match
return $prefix . $bracketed;
}
my $str = 'sp { { word } }';
print get_bracketed $str;
The regex with the gc
modifier tells the string to remember where the end point of the match is, and extract_bracketed
uses that information to know where to start.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With