What is the correct syntax for finding a substring (a string which is preceded and followed by specific strings) which does not match a specific pattern?
For example, I want to take all substrings which start with BEGIN_
, end with _END
and the substring in between is not equal to FOO
; and replace the whole substring with the format "(inner substring)". The following would match:
BEGIN_bar_END
-> (bar)
BEGIN_buz_END
-> (buz)
BEGIN_ihfd8f398IHFf9f39_END
-> (ihfd8f398IHFf9f39)
But BEGIN_FOO_END
would not match.
I have played around with the following, but cannot seem to find the correct syntax:
sed -e 's/BEGIN_(^FOO)_END/($1)/g' sed -e 's/BEGIN_([^FOO])_END/($1)/g' sed -e 's/BEGIN_(?!FOO)_END/($1)/g' sed -e 's/BEGIN_(!FOO)_END/($1)/g' sed -e 's/BEGIN_(FOO)!_END/($1)/g' sed -e 's/BEGIN_!(FOO)_END/($1)/g'
The sed command has longlist of supported operations that can be performed to ease the process of editing text files. It allows the users to apply the expressions that are usually used in programming languages; one of the core supported expressions is Regular Expression (regex).
Using \1 to keep part of the pattern You can use this to exclude part of the characters matched by the regular expression. The "\1" is the first remembered pattern, and the "\2" is the second remembered pattern. Sed has up to nine remembered patterns. This will output "abcd" and delete the numbers.
As Avinash Raj has pointed out, sed uses basic regular expression (BRE) syntax by default, (which requires ( , ) , { , } to be preceded by \ to activate its special meaning), and -r option switches over to extended regular expression (ERE) syntax, which treats ( , ) , { , } as special without preceding \ .
Matches any single character in list : for example, [aeiou] matches all vowels. A list may include sequences like char1 - char2 , which matches any character between (inclusive) char1 and char2 . A leading ^ reverses the meaning of list , so that it matches any single character not in list .
There is no general negation operator in sed
, IIRC because compilation of regexes with negation to DFAs takes exponential time. You can work around this with
'/BEGIN_FOO_END/b; s/BEGIN_\(.*\)_END/(\1)/g'
where /BEGIN_FOO_END/b
means: if we find BEGIN_FOO_END
, then branch (jump) to the end of the sed
script.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With