Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can regular expressions be used to match nested patterns? [duplicate]

People also ask

How regular expressions are used for pattern matching?

Pattern matching is used by the shell commands such as the ls command, whereas regular expressions are used to search for strings of text in a file by using commands, such as the grep command. Lists all the files in the directory.

What can be matched using (*) in a regular expression?

You can repeat expressions with an asterisk or plus sign. A regular expression followed by an asterisk ( * ) matches zero or more occurrences of the regular expression. If there is any choice, the first matching string in a line is used.

How a regular expression is used for pattern matching in Python?

Regular expressions, called regexes for short, are descriptions for a pattern of text. For example, a \d in a regex stands for a digit character — that is, any single numeral 0 to 9. Following regex is used in Python to match a string of three numbers, a hyphen, three more numbers, another hyphen, and four numbers.


No. It's that easy. A finite automaton (which is the data structure underlying a regular expression) does not have memory apart from the state it's in, and if you have arbitrarily deep nesting, you need an arbitrarily large automaton, which collides with the notion of a finite automaton.

You can match nested/paired elements up to a fixed depth, where the depth is only limited by your memory, because the automaton gets very large. In practice, however, you should use a push-down automaton, i.e a parser for a context-free grammar, for instance LL (top-down) or LR (bottom-up). You have to take the worse runtime behavior into account: O(n^3) vs. O(n), with n = length(input).

There are many parser generators avialable, for instance ANTLR for Java. Finding an existing grammar for Java (or C) is also not difficult.
For more background: Automata Theory at Wikipedia


Using regular expressions to check for nested patterns is very easy.

'/(\((?>[^()]+|(?1))*\))/'

Probably working Perl solution, if the string is on one line:

my $NesteD ;
$NesteD = qr/ \{( [^{}] | (??{ $NesteD }) )* \} /x ;

if ( $Stringy =~ m/\b( \w+$NesteD )/x ) {
    print "Found: $1\n" ;
  }

HTH

EDIT: check:

  • http://dev.perl.org/perl6/rfc/145.html
  • ruby information: http://www.ruby-forum.com/topic/112084
  • more perl: http://www.perlmonks.org/?node_id=660316
  • even more perl: https://metacpan.org/pod/Text::Balanced
  • perl, perl, perl: http://perl.plover.com/yak/regex/samples/slide083.html

And one more thing by Torsten Marek (who had pointed out correctly, that it's not a regex anymore):

  • http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2008-03/msg01047.html

Yes, if it is .NET RegEx-engine. .Net engine supports finite state machine supplied with an external stack. see details


The Pumping lemma for regular languages is the reason why you can't do that.

The generated automaton will have a finite number of states, say k, so a string of k+1 opening braces is bound to have a state repeated somewhere (as the automaton processes the characters). The part of the string between the same state can be duplicated infinitely many times and the automaton will not know the difference.

In particular, if it accepts k+1 opening braces followed by k+1 closing braces (which it should) it will also accept the pumped number of opening braces followed by unchanged k+1 closing brases (which it shouldn't).