Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sed regex and substring negation

Tags:

regex

sed

What is the correct syntax for finding a substring (a string which is preceded and followed by specific strings) which does not match a specific pattern?

For example, I want to take all substrings which start with BEGIN_, end with _END and the substring in between is not equal to FOO; and replace the whole substring with the format "(inner substring)". The following would match:

  • BEGIN_bar_END -> (bar)
  • BEGIN_buz_END -> (buz)
  • BEGIN_ihfd8f398IHFf9f39_END -> (ihfd8f398IHFf9f39)

But BEGIN_FOO_END would not match.

I have played around with the following, but cannot seem to find the correct syntax:

sed -e 's/BEGIN_(^FOO)_END/($1)/g' sed -e 's/BEGIN_([^FOO])_END/($1)/g' sed -e 's/BEGIN_(?!FOO)_END/($1)/g' sed -e 's/BEGIN_(!FOO)_END/($1)/g' sed -e 's/BEGIN_(FOO)!_END/($1)/g' sed -e 's/BEGIN_!(FOO)_END/($1)/g' 
like image 550
Anthony Avatar asked Jan 29 '12 12:01

Anthony


People also ask

Does sed work with regex?

The sed command has longlist of supported operations that can be performed to ease the process of editing text files. It allows the users to apply the expressions that are usually used in programming languages; one of the core supported expressions is Regular Expression (regex).

What is\ 1 in sed?

Using \1 to keep part of the pattern You can use this to exclude part of the characters matched by the regular expression. The "\1" is the first remembered pattern, and the "\2" is the second remembered pattern. Sed has up to nine remembered patterns. This will output "abcd" and delete the numbers.

What type of regex does sed use?

As Avinash Raj has pointed out, sed uses basic regular expression (BRE) syntax by default, (which requires ( , ) , { , } to be preceded by \ to activate its special meaning), and -r option switches over to extended regular expression (ERE) syntax, which treats ( , ) , { , } as special without preceding \ .

How do you match a character in sed?

Matches any single character in list : for example, [aeiou] matches all vowels. A list may include sequences like char1 - char2 , which matches any character between (inclusive) char1 and char2 . A leading ^ reverses the meaning of list , so that it matches any single character not in list .


1 Answers

There is no general negation operator in sed, IIRC because compilation of regexes with negation to DFAs takes exponential time. You can work around this with

'/BEGIN_FOO_END/b; s/BEGIN_\(.*\)_END/(\1)/g' 

where /BEGIN_FOO_END/b means: if we find BEGIN_FOO_END, then branch (jump) to the end of the sed script.

like image 198
Fred Foo Avatar answered Oct 04 '22 15:10

Fred Foo