Answering How to remove the last CR char with cut
I found out that some programs do add a trailing new line to the end of a string, while others don't:
Say we have the string foobar
and print it with printf
so that we don't get an extra new line:
$ printf "foobar" | od -c
0000000 f o o b a r
0000006
Or with echo -n
:
$ echo -n "foobar" | od -c
0000000 f o o b a r
0000006
(echo
's default behaviour is to return the output followed by a newline, so echo "foobar"
returns f o o b a r \n
).
Neither sed
nor cat
do add any extra character:
$ printf "foobar" | sed 's/./&/g' | od -c
0000000 f o o b a r
0000006
$ printf "foobar" | cat - | od -c
0000000 f o o b a r
0000006
Whereas both awk
and cut
do. Also xargs
and paste
add this trailing new line:
$ printf "foobar" | cut -b1- | od -c
0000000 f o o b a r \n
0000007
$ printf "foobar" | awk '1' | od -c
0000000 f o o b a r \n
0000007
$ printf "foobar" | xargs | od -c
0000000 f o o b a r \n
0000007
$ printf "foobar" | paste | od -c
0000000 f o o b a r \n
0000007
So I was wondering: why is this different behaviour? Is there anything POSIX suggests about this?
Note I am running all of this in my Bash 4.3.11 and the rest is:
So I was wondering: why is this different behaviour? Is there anything POSIX suggests about this?
Some commands (like for example printf
) are simple interface to the libc
library calls (e.g. printf()
) which don't add \n
automatically. Most *NIX text processing commands would add a \n
on the end of the last line.
From the Definitions of POSIXv7, a textual line has to have a newline
on the end:
3.206 Line
A sequence of zero or more non-
<newline>
characters plus a terminating character.
If the newline
is missing, it becomes this:
3.195 Incomplete Line
A sequence of one or more non-
<newline>
characters at the end of the file.
The general idea is that text file can be treated as a list of records, where every record is terminated by \n
. In other words, \n
is not something between lines - it is the part of the line. See for example the fgets()
function: the \n
is always included and serves to identify the case whether the text line was read completely or not. If the last line is missing the \n
, then one has to do more checks to read the file correctly.
In general, as long as your text files are created on *NIX by *NIX programs/scripts, it is fine to expect that last line is properly terminated. But many Java applications as well as the Windows applications do not handle that correctly or consistently. Not only they often forget to add the last \n
, oftentimes they also incorrectly treat the trailing \n
as an additional empty line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With