Can regexes containing nongreedy (reluctant) quantifiers be rewritten to use only greedy ones?

Tags:

Suppose I have a regex language supporting literals, positive and negative character classes, ordered alternation, and the greedy quantifiers ?, *, and +. (This is essentially a subset of PCRE without backreferences, look-around assertions, or some of the other fancier bits.) Does adding the nongreedy quantifiers ??, *?, and +? increase the expressive power of this formalism?

Put another way, given a pattern S containing a nongreedy quantifier, can that pattern be rewritten to an equivalent pattern T which contains no nongreedy quantifiers?

If this question has been considered in the literature, I'd appreciate any references which anyone can provide. I was able to turn up almost no theoretical work on the expressive power of extended regex formalisms (beyond the usual things about how backreferences move you from regular languages to context-free grammars).

209

asked Jul 20 '11 18:07

uckelman

1 Answers

When you say "Regex", you're refering to a several techniques - this isn't just an issue of the underlying theory. Consider the question "does this string match my given Regex?" For such a question, the notion of "greedy" is merely an implementation detail - if you're using one of the common (but inefficient) backtracking implementations, this can affect performance but not output. Similarly, the question "does this string contain a match?" is not affected by greedy vs. non-greedy quantifiers. This first type of regex is concerning with the abstract notion of set-membership: defining the language of matching strings.

So why do non-greedy quantifiers even exist? Regexes are not merely used for matching; common implementations can locate where the match is and which parts of the regex match which parts of the output. By doing this, a user depends on the intricacies of the implementation, which isn't trivial. This second type of regex is concerned with getting a few bits of text into a more practical representation withing the context of an otherwise turing-complete language.

Generally, when you talk of the strength of the regular expression formalism, you're talking about the first world - in which the computer answers with a simple yes or no. It's easy to talk about because the specification is clear. When you talk of a greedy vs. non-greedy quantifier, you're talking about the second world - used lots in practice, but with a specification that's mostly grown without much planning to solve real problems and is a standard by virtue of backwards compatibility. This second world is solving entirely different problems, and it's not clear to me what "expressive power" should even mean here. Certainly, non-greedy can be practical; and that's kind of the point...

Non-greedy quantifiers don't do anything for the first type of expressiveness, and they do for the second, although it's not quite clear what "expressive power" means here anyhow.

answered Sep 23 '22 13:09

Eamon Nerbonne

Related questions
                            
                                Spring request mapping with regex like in javax.ws.rs
                            
                                regexp-like library for matrix pattern search
                            
                                Problems with the two parameter format function in boost::regex_replace
                            
                                Best Practice: Partial Regex Matching
                            
                                ASP.NET MVC4 regular expression model validation attribute doesn't work client side with accented characters
                            
                                How to capture 0-2 groups in C++ regular expressions and print them?
                            
                                (PHP) Parsing RegEx string - balancing brackets
                            
                                How does Eclipse execute such a fast search for hits to a phrase/regexp
                            
                                vim pattern matching two tokens without another given token inbetween
                            
                                Apache regex backreferences UNREACHABLE in httpd 2.4?
                            
                                Regular Expressions (Normal OR Nested Brackets)
                            
                                Regular expression for syntax highlighting attributes in HTML tag
                            
                                Regular expressions in ignored_words in Sublime Text 3 spell_check?
                            
                                Search and replace C# expression
                            
                                Java Regex to find particular file
                            
                                why I cannot suppress warning with regex using warnings.filterwarnings
                            
                                Filtering logs with regex in java
                            
                                IntelliJ says "\\" (match on single backslash) is an illegal / unsupported escape sequence for Pattern.compile
                            
                                JavaScript: split doesn't work in IE?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can regexes containing nongreedy (reluctant) quantifiers be rewritten to use only greedy ones?

Tags:

regex

theory

non-greedy

pcre

uckelman

People also ask

1 Answers

Eamon Nerbonne

Recent Activity

Donate For Us