Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if a given regex will match anything

Tags:

string

regex

perl

Is it possible to check if a given regular expression will match any string? Specifically, I'm looking for a function matchesEverything($regex) that returns true iff $regex will match any string.

I suppose that this is equivalent to asking, "given a regex r, does there exist a string that doesn't match r?" and I don't think this is solvable without placing bounds on the set of "all strings". I.e., if I assume the strings will never contain "blahblah", then I can simply check if r matches "blahblah". But what if there are no such bounds? I'm wondering if this problem can be solved checking if the regex r is equivalent to .*.

like image 988
Christopher Neylan Avatar asked Jul 30 '13 18:07

Christopher Neylan


People also ask

Does * match everything in regex?

Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.

Does empty regex match everything?

An empty regular expression matches everything.

Which regex will match any character but AB or C?

[abc] : matches a, b, or c. [a-z] : matches every character between a and z (in Unicode code point order). [^abc] : matches anything except a, b, or c.


1 Answers

This doesn't exactly answer your question, but hopefully explains a little why a simple answer is hard to come by:

First, the term 'regex' is a bit murky, so just to clarify, we have:

  • "Strict" regular expressions, which are equivalent to deterministic finite automatons (DFAs).
  • Perl-compatible regular expressions (PCREs), which add a bunch of bells and whistles such as lookaheads, backreferences, etc. These are implemented in other languages too, such as Python and Java.
  • Actual Perl regular expressions, which can get even more crazy, including arbitrary Perl code, via the ?{...} construct.

I think this problem is solvable for strict regular expressions. You just construct the corresponding DFA and search that graph to see if there's any path to a non-accept state. But that doesn't help for 'real world' regex, which is usually PCRE.

I don't think PCRE is Turing-complete (though I don't know - see this question, too: Are Perl regexes turing complete?). If it were, then I think as Jim Garrison commented, this is basically the halting problem. That said, it's not easy to transform them into a DFA, either, making the above method useless...

I don't have an answer for PCREs, but be aware that the aforementioned constructs (backreferences, etc) would make it pretty hard, I imagine. Though I hesitate to say "impossible."

A genuine Perl regex with ?{...} in it is definitely Turing-complete, so there be dragons, and I think you're out of luck.

like image 121
jwd Avatar answered Sep 28 '22 04:09

jwd