With regex how can a match everything in a string that isnt something? This may not make sense but read on. So take the word <code>baby</code> for instance to match everything that isn't a <code>b</code> you would do something like <code>[^b]</code> and this would match <code>a</code> and <code>y</code>. Simple enough! But how in this string <code>Ben sits on a bench</code> can I match everything that isn't <code>ben</code> so i would be attempting to match <code>sits on a ch</code>? Better yet match everything that isn't a pattern? e.g. in <code>1a2be3</code> match everything that isn't <code>number,letter,number</code>, so it would match every combination in the string except <code>1a2</code>?

<pre class="prettyprint"><code>(?:ben)|(.) </code></pre> What this regex does is match <code>ben</code> or any other character, however, <code>ben</code> isn't captured but the other characters are. So you'll end up with a lot of matches except for the <code>ben</code>'s. Then you can join all those matches together to get the string without the <code>ben</code>'s. Here an example in python. <pre class="prettyprint"><code>import re thestr = "Ben sits on a bench" regex = r'(?:ben)|(.)' matches = re.findall(regex, thestr, re.IGNORECASE) print ''.join(matches) </code></pre> This will ouput: <pre class="prettyprint"><code> sits on a ch </code></pre> Note the leading space. You can of course get rid of that by adding <code>.strip()</code>. Also note, that it is probably faster to do a regex that replaces <code>ben</code> with an empty string to get the same result. But if you want to use this technique in a more complex regex it could come in handy. And of course you can also put more complex regexes at the place of <code>ben</code>, so for example your <code>number,letter,number</code> example would be: <pre class="prettyprint"><code>(?:[0-9][a-z][0-9])|(.) </code></pre>

Short answer: You can't do what you're asking. Technically, the first part has an ugly answer, but the second part (as I understand it) has no answer. <hr> For your first part, I have a pretty impractical (yet pure regex) answer; anything better would require code (like @rednaw's much cleaner answer above). I added to the test to make it more comprehensive. (For simplicity, I'm using <code>grep -Pio</code> for PCRE, case insensitive, printing one match per line.) <pre class="prettyprint lang-bsh prettyprint-override"><code>$ echo "Ben sits on a bench better end" \ |grep -Pio '(?=b(?!en)|(?<!b)en|e(?!n)|(?<!be)n|[^ben])\w+' sits on a ch better end </code></pre> I'm basically making a special case for any letter in "ben" so I can include only iterations that are not themselves part of the string "ben." As I said, not really practical, even if I am technically answering your question. I've also saved a blow-by-blow explanation of this regex if you want further detail. If you're forced into using a pure regex rather than code, your best bet for items like this is to write code to generate the regex. That way you can keep a clean copy of it. <hr> I'm not sure what you're asking for the remainder of your challenge; a regex is either greedy or lazy [1] [2], and I don't know of any implementations that can find "every combination" rather than merely the first combination by either method. If there were such a thing, it would be very very slow in real life (rather than quick examples); the slow speed of regex engines would be intolerable if they were forced to examine every possibility, which would basically be a ReDoS. Examples: <pre class="prettyprint"><code># greedy evaluation (default) $ echo 1a2be3 |grep -Pio '(?!\d[a-z]\d)\w+' a2be3 # lazy evaluation $ echo 1a2be3 |grep -Pio '(?!\d[a-z]\d)\w+?' a 2 b e 3 </code></pre> I assume you are looking for <code>1</code> <code>1a</code> <code>a</code> <code>a2</code> <code>a2b</code> <code>a2be</code> <code>a2be3</code> <code>2</code> <code>2b</code> <code>2be</code> <code>2be3</code> <code>b</code> <code>be</code> <code>be3</code> <code>e</code> <code>e3</code> <code>3</code> but I don't think you can get that with a pure regex. You'd need some code to generate every substring and then you could use a regex to filter out the forbidden pattern (again, this is all about greedy vs lazy vs ReDoS).

Match anything that is a something?

Tags:

regex

With regex how can a match everything in a string that isnt something? This may not make sense but read on.

So take the word baby for instance to match everything that isn't a b you would do something like [^b] and this would match a and y. Simple enough! But how in this string Ben sits on a bench can I match everything that isn't ben so i would be attempting to match sits on a ch?

Better yet match everything that isn't a pattern? e.g. in 1a2be3 match everything that isn't number,letter,number, so it would match every combination in the string except 1a2?

435

asked Dec 10 '13 10:12

Srb1313711

2 Answers

(?:ben)|(.)

What this regex does is match ben or any other character, however, ben isn't captured but the other characters are. So you'll end up with a lot of matches except for the ben's. Then you can join all those matches together to get the string without the ben's.

Here an example in python.

import re

thestr = "Ben sits on a bench"
regex = r'(?:ben)|(.)'

matches = re.findall(regex, thestr, re.IGNORECASE)
print ''.join(matches)

This will ouput:

 sits on a ch

Note the leading space. You can of course get rid of that by adding .strip().

Also note, that it is probably faster to do a regex that replaces ben with an empty string to get the same result. But if you want to use this technique in a more complex regex it could come in handy.

And of course you can also put more complex regexes at the place of ben, so for example your number,letter,number example would be:

(?:[0-9][a-z][0-9])|(.)

answered Oct 23 '22 06:10

gitaarik

Short answer: You can't do what you're asking. Technically, the first part has an ugly answer, but the second part (as I understand it) has no answer.

For your first part, I have a pretty impractical (yet pure regex) answer; anything better would require code (like @rednaw's much cleaner answer above). I added to the test to make it more comprehensive. (For simplicity, I'm using grep -Pio for PCRE, case insensitive, printing one match per line.)

$ echo "Ben sits on a bench better end" \
    |grep -Pio '(?=b(?!en)|(?<!b)en|e(?!n)|(?<!be)n|[^ben])\w+'
sits
on
a
ch
better
end

I'm basically making a special case for any letter in "ben" so I can include only iterations that are not themselves part of the string "ben." As I said, not really practical, even if I am technically answering your question. I've also saved a blow-by-blow explanation of this regex if you want further detail.

If you're forced into using a pure regex rather than code, your best bet for items like this is to write code to generate the regex. That way you can keep a clean copy of it.

I'm not sure what you're asking for the remainder of your challenge; a regex is either greedy or lazy [1] [2], and I don't know of any implementations that can find "every combination" rather than merely the first combination by either method. If there were such a thing, it would be very very slow in real life (rather than quick examples); the slow speed of regex engines would be intolerable if they were forced to examine every possibility, which would basically be a ReDoS.

Examples:

# greedy evaluation (default)
$ echo 1a2be3 |grep -Pio '(?!\d[a-z]\d)\w+'
a2be3

# lazy evaluation
$ echo 1a2be3 |grep -Pio '(?!\d[a-z]\d)\w+?'
a
2
b
e
3

I assume you are looking for 1 1a a a2 a2b a2be a2be3 2 2b 2be 2be3 b be be3 e e3 3 but I don't think you can get that with a pure regex. You'd need some code to generate every substring and then you could use a regex to filter out the forbidden pattern (again, this is all about greedy vs lazy vs ReDoS).

answered Oct 23 '22 04:10

Adam Katz

Related questions
                            
                                How to capture 0-2 groups in C++ regular expressions and print them?
                            
                                (PHP) Parsing RegEx string - balancing brackets
                            
                                How does Eclipse execute such a fast search for hits to a phrase/regexp
                            
                                vim pattern matching two tokens without another given token inbetween
                            
                                Apache regex backreferences UNREACHABLE in httpd 2.4?
                            
                                Regular Expressions (Normal OR Nested Brackets)
                            
                                Regular expression for syntax highlighting attributes in HTML tag
                            
                                Regular expressions in ignored_words in Sublime Text 3 spell_check?
                            
                                Search and replace C# expression
                            
                                Java Regex to find particular file
                            
                                why I cannot suppress warning with regex using warnings.filterwarnings
                            
                                Filtering logs with regex in java
                            
                                IntelliJ says "\\" (match on single backslash) is an illegal / unsupported escape sequence for Pattern.compile
                            
                                JavaScript: split doesn't work in IE?
                            
                                Can regexes containing nongreedy (reluctant) quantifiers be rewritten to use only greedy ones?
                            
                                Convert url to html <a> tag and image url to <img> tag with javascript regular expressions
                            
                                Can A MatchCollection hang the program when trying to iterate it?
                            
                                Laravel validation regex, breaks in view
                            
                                Pattern matching on a string that already has wildcards in it
                            
                                Splitting sentences with nltk while preserving quotes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With