I need to find all the strings matching a pattern with the exception of two given strings.
For example, find all groups of letters with the exception of aa
and bb
. Starting from this string:
-a-bc-aa-def-bb-ghij-
Should return:
('a', 'bc', 'def', 'ghij')
I tried with this regular expression that captures 4 strings. I thought I was getting close, but (1) it doesn't work in Python and (2) I can't figure out how to exclude a few strings from the search. (Yes, I could remove them later, but my real regular expression does everything in one shot and I would like to include this last step in it.)
I said it doesn't work in Python because I tried this, expecting the exact same result, but instead I get only the first group:
>>> import re
>>> re.search('-(\w.*?)(?=-)', '-a-bc-def-ghij-').groups()
('a',)
I tried with negative look ahead, but I couldn't find a working solution for this case.
You can make use of negative look aheads.
For example,
>>> re.findall(r'-(?!aa|bb)([^-]+)', string)
['a', 'bc', 'def', 'ghij']
-
Matches -
(?!aa|bb)
Negative lookahead, checks if -
is not followed by aa
or bb
([^-]+)
Matches ony or more character other than -
Edit
The above regex will not match those which start with aa
or bb
, for example like -aabc-
. To take care of that we can add -
to the lookaheads like,
>>> re.findall(r'-(?!aa-|bb-)([^-]+)', string)
You need to use a negative lookahead to restrict a more generic pattern, and a re.findall
to find all matches.
Use
res = re.findall(r'-(?!(?:aa|bb)-)(\w+)(?=-)', s)
or - if your values in between hyphens can be any but a hyphen, use a negated character class [^-]
:
res = re.findall(r'-(?!(?:aa|bb)-)([^-]+)(?=-)', s)
Here is the regex demo.
Details:
-
- a hyphen(?!(?:aa|bb)-)
- if there is aaa-
or bb-
after the first hyphen, no match should be returned(\w+)
- Group 1 (this value will be returned by the re.findall
call) capturing 1 or more word chars OR [^-]+
- 1 or more characters other than -
(?=-)
- there must be a -
after the word chars. The lookahead is required here to ensure overlapping matches (as this hyphen will be a starting point for the next match).Python demo:
import re
p = re.compile(r'-(?!(?:aa|bb)-)([^-]+)(?=-)')
s = "-a-bc-aa-def-bb-ghij-"
print(p.findall(s)) # => ['a', 'bc', 'def', 'ghij']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With