Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex with recursive expression to match nested braces?

I'm trying to match text like sp { ...{...}... }, where the curly braces are allowed to nest. This is what I have so far:

my $regex = qr/
(                   #save $1
    sp\s+           #start Soar production
    (               #save $2
        \{          #opening brace
        [^{}]*      #anything but braces
        \}          #closing brace  
        | (?1)      #or nested braces
    )+              #0 or more
)
/x;

I just cannot get it to match the following text: sp { { word } }. Can anyone see what is wrong with my regex?

like image 887
Nate Glenn Avatar asked Oct 04 '12 03:10

Nate Glenn


People also ask

How do you match brackets in regex?

The square brackets match any one of characters inside the brackets. A range of characters in the alphabet can be matched using the hyphen. For example, "/[xyz]/ "will match any of "x", "y", or "z".

What is {} in regular expression?

Integer values enclosed in {} indicate the number of times to apply the preceding regular expression. n is the minimum number, and u is the maximum number. If you specify only n, it indicates the exact number of times to apply the regular expression.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

What does \f mean in regex?

Definition and Usage The \f metacharacter matches form feed characters.


2 Answers

There are numerous problems. The recursive bit should be:

(
   (?: \{ (?-1) \}
   |   [^{}]+
   )*
)

All together:

my $regex = qr/
   sp\s+
   \{
      (
         (?: \{ (?-1) \}
         |   [^{}]++
         )*
      )
   \}
/x;

print "$1\n" if 'sp { { word } }' =~ /($regex)/;
like image 192
ikegami Avatar answered Nov 19 '22 05:11

ikegami


This is case for the underused Text::Balanced, a very handy core module for this kind of thing. It does rely on the pos of the start of the delimited sequence being found/set first, so I typically invoke it like this:

#!/usr/bin/env perl

use strict;
use warnings;

use Text::Balanced 'extract_bracketed';

sub get_bracketed {
  my $str = shift;

  # seek to beginning of bracket
  return undef unless $str =~ /(sp\s+)(?={)/gc;

  # store the prefix
  my $prefix = $1;

  # get everything from the start brace to the matching end brace
  my ($bracketed) = extract_bracketed( $str, '{}');

  # no closing brace found
  return undef unless $bracketed;

  # return the whole match
  return $prefix . $bracketed;
}

my $str = 'sp { { word } }';

print get_bracketed $str;

The regex with the gc modifier tells the string to remember where the end point of the match is, and extract_bracketed uses that information to know where to start.

like image 21
Joel Berger Avatar answered Nov 19 '22 05:11

Joel Berger