Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trailing new line after piping to a command: is there any standard?

Tags:

bash

unix

posix

gnu

Answering How to remove the last CR char with cut I found out that some programs do add a trailing new line to the end of a string, while others don't:

Say we have the string foobar and print it with printf so that we don't get an extra new line:

$ printf "foobar" | od -c
0000000   f   o   o   b   a   r
0000006

Or with echo -n:

$ echo -n "foobar" | od -c
0000000   f   o   o   b   a   r
0000006

(echo's default behaviour is to return the output followed by a newline, so echo "foobar" returns f o o b a r \n).

Neither sed nor cat do add any extra character:

$ printf "foobar" | sed 's/./&/g' | od -c
0000000   f   o   o   b   a   r
0000006
$ printf "foobar" | cat - | od -c
0000000   f   o   o   b   a   r
0000006

Whereas both awk and cut do. Also xargs and paste add this trailing new line:

$ printf "foobar" | cut -b1- | od -c
0000000   f   o   o   b   a   r  \n
0000007
$ printf "foobar" | awk '1' | od -c
0000000   f   o   o   b   a   r  \n
0000007
$ printf "foobar" | xargs | od -c
0000000   f   o   o   b   a   r  \n
0000007
$ printf "foobar" | paste | od -c
0000000   f   o   o   b   a   r  \n
0000007

So I was wondering: why is this different behaviour? Is there anything POSIX suggests about this?

Note I am running all of this in my Bash 4.3.11 and the rest is:

  • GNU Awk 4.0.1
  • sed (GNU sed) 4.2.2
  • cat (GNU coreutils) 8.21
  • cut (GNU coreutils) 8.21
  • xargs (GNU findutils) 4.4.2
  • paste (GNU coreutils) 8.21
like image 256
fedorqui 'SO stop harming' Avatar asked Nov 09 '22 16:11

fedorqui 'SO stop harming'


1 Answers

So I was wondering: why is this different behaviour? Is there anything POSIX suggests about this?

Some commands (like for example printf) are simple interface to the libc library calls (e.g. printf()) which don't add \n automatically. Most *NIX text processing commands would add a \n on the end of the last line.

From the Definitions of POSIXv7, a textual line has to have a newline on the end:

3.206 Line

A sequence of zero or more non- <newline> characters plus a terminating character.

If the newline is missing, it becomes this:

3.195 Incomplete Line

A sequence of one or more non- <newline> characters at the end of the file.

The general idea is that text file can be treated as a list of records, where every record is terminated by \n. In other words, \n is not something between lines - it is the part of the line. See for example the fgets() function: the \n is always included and serves to identify the case whether the text line was read completely or not. If the last line is missing the \n, then one has to do more checks to read the file correctly.

In general, as long as your text files are created on *NIX by *NIX programs/scripts, it is fine to expect that last line is properly terminated. But many Java applications as well as the Windows applications do not handle that correctly or consistently. Not only they often forget to add the last \n, oftentimes they also incorrectly treat the trailing \n as an additional empty line.

like image 106
Dummy00001 Avatar answered Dec 06 '22 21:12

Dummy00001