Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to tell sed "dot match new line"

Tags:

sed

I can't figure how to tell sed dot match new line:

echo -e "one\ntwo\nthree" | sed 's/one.*two/one/m'

I expect to get:

one
three

instead I get original:

one
two
three

like image 769
theta Avatar asked Dec 24 '11 11:12

theta


People also ask

Does dot match new line?

By default, the dot character ( "." ) does not match a line feed (newline, line break, "\n" ) in any language. Some languages allow you to modify the behavior of a regular expression by appending a modifier to the pattern expression. E.g. /foo/i makes the pattern case-insensitive in many languages.

What does dot mean in SED?

The dot is a special character meaning "match any character". $ sed s/\\.// temp 225. You would think that you could do sed s/\.// temp , but your shell will escape that single backslash and pass s/.// to sed.. So, you need to put two backslashes to pass a literal backslash to sed, which will properly treat \.

What does R mean in SED?

r is used to read a file and append it at the current point. The point in your example is the address /EOF/ which means this script will find the line containing EOF and then append the file specified by $thingToAdd after that point. Then it will process the rest of the file.

How do you end a sed?

(quit) Exit sed without processing any more commands or input. (quit) This command is the same as q , but will not print the contents of pattern space.


3 Answers

If you use a GNU sed, you may match any character, including line break chars, with a mere ., see :

.
         Matches any character, including newline.

All you need to use is a -z option:

echo -e "one\ntwo\nthree" | sed -z 's/one.*two/one/'
# => one
#    three

See the online sed demo.

However, one.*two might not be what you need since * is always greedy in POSIX regex patterns. So, one.*two will match the leftmost one, then any 0 or more chars as many as possible, and then the rightmost two. If you need to remove one, then any 0+ chars as few as possible, and then the leftmost two, you will have to use perl:

perl -i -0 -pe 's/one.*?two//sg' file             # Non-Unicode version
perl -i -CSD -Mutf8 -0 -pe 's/one.*?two//sg' file # S&R in a UTF8 file 

The -0 option enables the slurp mode so that the file could be read as a whole and not line-by-line, -i will enable inline file modification, s will make . match any char including line break chars, and .*? will match any 0 or more chars as few as possible due to a non-greedy *?. The -CSD -Mutf8 part make sure your input is decoded and output re-encoded back correctly.

like image 196
Wiktor Stribiżew Avatar answered Sep 22 '22 17:09

Wiktor Stribiżew


sed is line-based tool. I don't think these is an option.
You can use h/H(hold), g/G(get).

$ echo -e 'one\ntwo\nthree' | sed -n '1h;1!H;${g;s/one.*two/one/p}'
one
three

Maybe you should try vim

:%s/one\_.*two/one/g
like image 22
kev Avatar answered Sep 23 '22 17:09

kev


This might work for you:

<<<$'one\ntwo\nthree' sed '/two/d'

or

<<<$'one\ntwo\nthree' sed '2d'

or

<<<$'one\ntwo\nthree' sed 'n;d'

or

<<<$'one\ntwo\nthree' sed 'N;N;s/two.//'

Sed does match all characters (including the \n) using a dot . but usually it has already stripped the \n off, as part of the cycle, so it no longer present in the pattern space to be matched.

Only certain commands (N,H and G) preserve newlines in the pattern/hold space.

  1. N appends a newline to the pattern space and then appends the next line.
  2. H does exactly the same except it acts on the hold space.
  3. G appends a newline to the pattern space and then appends whatever is in the hold space too.

The hold space is empty until you place something in it so:

sed G file

will insert an empty line after each line.

sed 'G;G' file

will insert 2 empty lines etc etc.

like image 33
potong Avatar answered Sep 21 '22 17:09

potong