I am trying to exclude a group of words but include another group of words in a qregexp expression but I am currently having issues figuring this out.
Here are some of the things I tried (this example included all of the words):
(words|I|want|to|include)(?!the|ones|that|should|not|match)
So I tried this (which returned nothing):
^(words|I|want|to|include)(?:(?!the|ones|that|should|not|match).)*$
Am I missing something?
Edit: The reason why I need such an unusual regex (include/exclude) is because I want to search through a series of articles and filter the ones that have the included words in them but not if they also have the excluded words in them.
So for example if article A is:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
and article B is:
Vivamus fermentum semper porta.
Then a regex that includes lorem
would filter article A but not B. But if ipsum
is a word that I'm excluding, I do not want article A to be filtered.
I considered doing a regex to filter out the articles with the words that I want and then run a second regex excluding articles from the first set that I do not want, but unfortunately the software I am using does not allow me to do this. I can only run one regular expression.
I think there is no need in a tempered greedy quantifier. Use excluded words as alternatives inside an anchored negative look-ahead. Let me guide you through this.
You say, you have Lorem ipsum dolor sit amet, consectetur adipiscing elit.
, and you want it to match since it contains the word lorem
. The regex is \\blorem\\b
(with QRegExp.CaseInsensitive set to 1
) where \b
is used to force whole word matching. To prevent the match in case the string contains the word ipsum
, you need to use the lookahead at the very beginning of the string.
^(?!.*\\bipsum\\b).*\\blorem\\b
Now, it does not match the string in question.
To add more alternatives, we can use an alternation operator |
, and we can do it like this: ^(?!.*\\b(?:words|to|exclude)\\b).*\\b(?:words|to|include)\\b
. Note the use of non-capturing groups, it does not store any captured texts and potentially improves performance as compared to capturing groups that save the matched text in a buffer.
Thus, you get
^(?!.*\\b(?:the|ones|that|should|not|match)\\b).*\\b(?:words|I|want|to|include)\\b
See demo
Two remarks:
QRegExp
..
in the pattern matches any character including a newline. At the demo Web site, the dot does not match newline symbols. You may want to replace it with [^\n]
if you need the same functionality, but I think it is not necessary.^(?:(?!\b(?:the|ones|that|should|not|match)\b).)*\b(?:words|I|want|to|include)\b(?:(?!\b(?:the|ones|that|should|not|match)\b).)*$
You need to add lookahead to both parts after you find words whcih should match.See demo.
https://regex101.com/r/bK9wF1/3
or
^(?!.*\b(?:the|ones|that|should|not|match)\b)(?=.*\b(?:words|I|want|to|include)\b).*$
Add both conditions under lookaheads
.See demo.
https://regex101.com/r/uF4oY4/60
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With