I have a list of regexes from which I want to extract those that are equivalent to a string comparison.
For example, those regexes are equivalent to a simple string comparison:
[r"example", # No metacharacters
r"foo\.bar"] # . is not a metacharacter because it is escaped
while those regexes are not:
[r"e.ample", # . is a metacharacter
r"foo\\.bar"] # . is a metacharacter because it is not escaped
According to https://docs.python.org/2/howto/regex.html, the list of valid metacharacters is . ^ $ * + ? { } [ ] \ | ( )
.
I'm about to build a regex, but it looks to be a bit complicated. I'm wondering if there's a shortcut by examining the re
object or something.
Inspired by Keith Hall's comment, here's a solution based on an undocumented feature of Python's regex compiler:
import re, sys, io
def contains_meta(regex):
stdout = sys.stdout # remember stdout
sys.stdout = io.StringIO() # redirect stdout to string
re.compile(regex, re.DEBUG) # compile the regex for the debug tree side effect
output = sys.stdout.getvalue() # get that debug tree
sys.stdout = stdout # restore stdout
return not all(line.startswith("LITERAL ") for line in output.strip().split("\n"))
Output:
In [9]: contains_meta(r"example")
Out[9]: False
In [10]: contains_meta(r"ex.mple")
Out[10]: True
In [11]: contains_meta(r"ex\.mple")
Out[11]: False
In [12]: contains_meta(r"ex\\.mple")
Out[12]: True
In [13]: contains_meta(r"ex[.]mple") # single-character charclass --> literal
Out[13]: False
In [14]: contains_meta(r"ex[a-z]mple")
Out[14]: True
In [15]: contains_meta(r"ex[.,]mple")
Out[15]: True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With