Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where do regexes with backreferences belong in the chomsky hierarchy (do they belong there at all)?

Tags:

regex

I'm somewhat confused here - RegExes with back references are apparently not regular expressions, because they they can, for instance, be used to describe the copy language ('ww' for any word w), which is context-sensitive. Yet at the same time, they still can't be used to describe context free languages like HTML (or even just matching parentheses) - at least I wouldn't know how such a thing would look like in e.g. POSIX regular expressions.

That being said - do "regular expressions" of this kind belong anywhere in the Chomsky hierarchy, or are they some frankenstein-abomination between the lines?

like image 889
Cubic Avatar asked Oct 20 '25 11:10

Cubic


1 Answers

They don't really fit.

Regexes with backreferences can match some non-context-free languages (for example (.*)\1) but also can't match all context-free languages (the typical example being nested parentheses).

Here is a relevant post on the CSTheory StackExchange, which has a few more details.

Note also that some implementations (e.g. .NET or Perl) go further than backreferences and can match nested parentheses.

like image 146
porges Avatar answered Oct 23 '25 02:10

porges