Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to exclude specific string using regex in Python?

I'd like to match strings like:

45 meters?
45, meters?
45?
45 ?

but not strings like:

45 meters you?
45 you  ?
45, and you?

In both cases the question mark must be at the end. So, essentially I want to exclude all those strings containing the word "you".

I've tried the following regex:

'\d+.*(?!you)\?$'

but it matches the second case (probably because of .*)

like image 587
f_ficarola Avatar asked Dec 01 '22 19:12

f_ficarola


1 Answers

There's a neat trick to exclude some matches from a regex, which you can use here:

>>> import re
>>> corpus = """
... 45 meters?
... 45?
... 45 ?
... 45 meters you?
... 45 you  ?
... 45, and you?
... """
>>> pattern = re.compile(r"\d+[^?]*you|(\d+[^?]*\?)")
>>> re.findall(pattern, corpus)
['45 meters?', '45?', '45 ?', '', '', '']

The downside is that you get empty matches when the exclusion kicks in, but those are easily filtered out:

>>> filter(None, re.findall(pattern, corpus))
['45 meters?', '45?', '45 ?']

How it works:

The trick is that we only pay attention to captured groups ... so the left hand side of the alternation - \d+[^?]*you (or "digits followed by non-?-characters followed by 'you'") matches what you don't want, and then we forget about it. Only if the left hand side doesn't match is the right hand side - (\d+[^?]*\?) (or "digits followed by non-?-characters followed by '?') - matched, and that one is captured.

like image 145
Zero Piraeus Avatar answered Dec 05 '22 03:12

Zero Piraeus