What is the correct syntax for finding a substring (a string which is preceded and followed by specific strings) which does not match a specific pattern? For example, I want to take all substrings which start with <code>BEGIN_</code>, end with <code>_END</code> and the substring in between is not equal to <code>FOO</code>; and replace the whole substring with the format "(inner substring)". The following would match: <ul> <li> <code>BEGIN_bar_END</code> -> <code>(bar)</code> </li> <li> <code>BEGIN_buz_END</code> -> <code>(buz)</code> </li> <li> <code>BEGIN_ihfd8f398IHFf9f39_END</code> -> <code>(ihfd8f398IHFf9f39)</code> </li> </ul> But <code>BEGIN_FOO_END</code> would not match. I have played around with the following, but cannot seem to find the correct syntax: <pre class="prettyprint"><code>sed -e 's/BEGIN_(^FOO)_END/($1)/g' sed -e 's/BEGIN_([^FOO])_END/($1)/g' sed -e 's/BEGIN_(?!FOO)_END/($1)/g' sed -e 's/BEGIN_(!FOO)_END/($1)/g' sed -e 's/BEGIN_(FOO)!_END/($1)/g' sed -e 's/BEGIN_!(FOO)_END/($1)/g' </code></pre>

There is no general negation operator in <code>sed</code>, IIRC because compilation of regexes with negation to DFAs takes exponential time. You can work around this with <pre class="prettyprint"><code>'/BEGIN_FOO_END/b; s/BEGIN_$.*$_END/(\1)/g' </code></pre> where <code>/BEGIN_FOO_END/b</code> means: if we find <code>BEGIN_FOO_END</code>, then branch (jump) to the end of the <code>sed</code> script.

Sed regex and substring negation

Tags:

regex

sed

What is the correct syntax for finding a substring (a string which is preceded and followed by specific strings) which does not match a specific pattern?

For example, I want to take all substrings which start with BEGIN_, end with _END and the substring in between is not equal to FOO; and replace the whole substring with the format "(inner substring)". The following would match:

BEGIN_bar_END -> (bar)
BEGIN_buz_END -> (buz)
BEGIN_ihfd8f398IHFf9f39_END -> (ihfd8f398IHFf9f39)

But BEGIN_FOO_END would not match.

I have played around with the following, but cannot seem to find the correct syntax:

sed -e 's/BEGIN_(^FOO)_END/($1)/g' sed -e 's/BEGIN_([^FOO])_END/($1)/g' sed -e 's/BEGIN_(?!FOO)_END/($1)/g' sed -e 's/BEGIN_(!FOO)_END/($1)/g' sed -e 's/BEGIN_(FOO)!_END/($1)/g' sed -e 's/BEGIN_!(FOO)_END/($1)/g'

550

asked Jan 29 '12 12:01

Anthony

1 Answers

There is no general negation operator in sed, IIRC because compilation of regexes with negation to DFAs takes exponential time. You can work around this with

'/BEGIN_FOO_END/b; s/BEGIN_\(.*\)_END/(\1)/g'

where /BEGIN_FOO_END/b means: if we find BEGIN_FOO_END, then branch (jump) to the end of the sed script.

198

answered Oct 04 '22 15:10

Fred Foo

Related questions
                            
                                jQuery :contains(regex)? [duplicate]
                            
                                How do I group regular expressions past the 9th backreference?
                            
                                regex word boundary excluding the hyphen
                            
                                Why can't you use repetition quantifiers in zero-width look behind assertions?
                            
                                Any way to escape a Go string in a regular expression?
                            
                                mod_rewrite RewriteCond - is NC flag necessary for just domain part? And some more
                            
                                Dart how to match and then replace a regexp
                            
                                php string matching with wildcard *?
                            
                                Do Python regular expressions have an equivalent to Ruby's atomic grouping?
                            
                                Regular expression, split string by capital letter but ignore TLA
                            
                                Regular expression parsing a binary file?
                            
                                Regular Expressions: How to Express \w Without Underscore
                            
                                Ruby match first occurrence of string for a gsub replacement
                            
                                Does lookbehind work in sed?
                            
                                How to write a search pattern to include a space in findstr?
                            
                                Regular Expression in sed for multiple replacements in one statement
                            
                                Regular Expression - Match any character except +, empty string should also be matched
                            
                                RegExp for matching three letters, but not text "BUY"
                            
                                Remove file extension and path from a string in Perl
                            
                                Making letters uppercase using re.sub in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With