Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sed: replacing newlines with "-z"?

Tags:

sed

Problem: replace some regex with \n with sed.

Solution: there are many similar answers [1][2][3][4], and many other links that I won't link. All of them suggest you to create a new label :a, merge lines N, branch to :a if not end-of-file $!ba, and then do some command.

That said... In the GNU sed manual, there is the -z option:

-z
--null-data
--zero-terminated

Treat the input as a set of lines, each terminated by a zero byte
(the ASCII ‘NUL’ character) instead of a newline. This option can
be used with commands like ‘sort -z’ and ‘find -print0’ to process
arbitrary file names. 

So, first, for comparison reasons, if we try the naive approach:

$ seq 3 | sed 's/\n/ /g'
1
2
3

However, using this -z option:

$ seq 3 | sed -z 's/\n/ /g'
1 2 3

The Real Question: Why?

Given that it "merges" all the lines, as specified in the documentation, I expected that I would have to use \0 instead of \n, since:

Treat the input as a set of lines, each terminated by a zero byte (the ASCII ‘NUL’ character)

Since I didn't find any post related to it, I think I might be misunderstanding something here... So, what does it really do? Why does it work?

like image 652
yZaph Avatar asked Sep 27 '18 13:09

yZaph


People also ask

What does sed Z do?

The -z option will cause sed to separate lines based on the ASCII NUL character instead of the newline character. Just like normal newline based processing, the NUL character is removed (if present) from the input line and added back accordingly when the processed line is printed.

How do you match a line at the end of sed?

/[a-zA-Z]\+$/{} means apply whatever comes inside the curlies to lines that match the regex. Inside the curlies, N means "append the next line to the active buffer" (what sed calls the 'pattern space')


1 Answers

Using -z changes what sed considers to be a line. \n remains \n, but it doesn't end a line, but the null character (which is represented as \x0 in Sed) would. As there are no null bytes in the output of seq, the whole output is considered one line and processed in single iteration (i.e. replacing all \n's by spaces).

like image 133
choroba Avatar answered Sep 19 '22 04:09

choroba