Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aren't modern regular expression dialects regular?

I've seen a few comments here that mention that modern regular expressions go beyond what can be represented in a regular language. How is this so?

What features of modern regular expressions are not regular? Examples would be helpful.

like image 841
David Johnstone Avatar asked Sep 30 '10 05:09

David Johnstone


People also ask

Are all regular expressions regular languages?

Regular expressions are used to denote regular languages. They can represent regular languages and operations on them succinctly. The set of regular expressions over an alphabet is defined recursively as below. Any element of that set is a regular expression.

Are regular expressions still used?

Despite being hard to read, hard to validate, hard to document and notoriously hard to master, regexes are still widely used today. Supported by all modern programming languages, text processing programs and advanced text editors, regexes are now used in more than a third of both Python and JavaScript projects.

Why is regex so hard?

Regular expressions are dense. This makes them hard to read, but not in proportion to the information they carry. Certainly 100 characters of regular expression syntax is harder to read than 100 consecutive characters of ordinary prose or 100 characters of C code.


1 Answers

The first thing that comes to mind is backreferences:

(\w*)\s\1

(matches a group of word characters, followed by a space character and then the same group previously matched) eg: hello hello matches, hello world doesn't.

This construct is not regular (ie: can't be generated by a regular grammar).


Another feature supported by Perl Compatible RegExp (PCRE) that is not regular are recursive patterns:

\((a*|(?R))*\)

This can be used to match any combination of balanced parentheses and "a"s (from wikipedia)

like image 115
NullUserException Avatar answered Nov 16 '22 00:11

NullUserException