I'm still relatively new to regex. I'm trying to find the shortest string of text that matches a particular pattern, but am having trouble if the shortest pattern is a substring of a larger match. For example:
import re
string = "A|B|A|B|C|D|E|F|G"
my_pattern = 'a.*?b.*?c'
my_regex = re.compile(my_pattern, re.DOTALL|re.IGNORECASE)
matches = my_regex.findall(string)
for match in matches:
print match
prints:
A|B|A|B|C
but I'd want it to return:
A|B|C
Is there a way to do this without having to loop over each match to see if it contains a substring that matches?
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.
Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow.
Contrary to most other answers here, this can be done in a single regex using a positive lookahead assertion with a capturing group:
>>> my_pattern = '(?=(a.*?b.*?c))'
>>> my_regex = re.compile(my_pattern, re.DOTALL|re.IGNORECASE)
>>> matches = my_regex.findall(string)
>>> print min(matches, key=len)
A|B|C
findall()
will return all possible matches, so you need min()
to get the shortest one.
How this works:
No. Perl returns the longest, leftmost match, while obeying your non-greedy quantifiers. You'll have to loop, I'm afraid.
Edit: Yes, I realize I said Perl above, but I believe it is true for Python.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With