Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can regexes containing ordered alternation be rewritten to use only unordered alternation?

Suppose I have a regex language supporting literals, positive and negative character classes, ordered alternation, the greedy quantifiers ?, *, and +, and the nongreedy quantifiers ??, *?, and +?. (This is essentially a subset of PCRE without backreferences, look-around assertions, or some of the other fancier bits.) Does replacing ordered alternation with unordered alternation decrease the expressive power of this formalism?

(Unordered alternation---also sometimes called "unordered choice"---is such that L(S|T) = L(S) + L(T), while ordered alternation is such that L(S|T) = L(S) + (L(T) - { a in L(T) : a extends some b in L(S) }). Concretely, the pattern a|aa would match the strings a and aa if the alternation is unordered, but only a if the alternation is ordered.)

Put another way, given a pattern S containing an ordered alternation, can that pattern be rewritten to an equivalent pattern T which contains no ordered alternations (but possibly unordered alternations instead)?

If this question has been considered in the literature, I'd appreciate any references which anyone can provide. I was able to turn up almost no theoretical work on the expressive power of extended regex formalisms (beyond the usual things about how backreferences move you from regular languages to context-free grammars).

like image 959
uckelman Avatar asked Jul 20 '11 18:07

uckelman


1 Answers

in http://swtch.com/~rsc/regexp/regexp3.html [section "Does the regexp match a substring of the string? If so, where?"] it's necessary to introduce the idea of priorities within the "DFA" (you need to read the entire series to understand, i suspect, but the "DFA" in question is expanded from the NFA graph "on the fly") to handle ordered alternations. while this is only an appeal to authority, and not a proof, i think it's fair to say that if russ cox can't do it (express ordered alternations as a pure DFA), then no-one knows how to.

like image 105
andrew cooke Avatar answered Sep 18 '22 23:09

andrew cooke