Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash pipe vs here-string

Tags:

bash

pipe

I tought these command were equivalent in bash, but they are producing different outputs. Could you help me understand why?

$ echo "SEBA" | wc
      1       1       5
$ wc <<< "SEBA"
1 1 5

Running on

  • Ubuntu 20.04.2 LTS
  • GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
  • wc (GNU coreutils) 8.30

here are some tests:

$ echo "SEBA" | wc | hexdump 
0000000 2020 2020 2020 2031 2020 2020 2020 2031
0000010 2020 2020 2020 0a35                    
0000018
$ wc <<< "SEBA" | hexdump 
0000000 2031 2031 0a35                         
0000006
$ echo "SEBA" | hexdump 
0000000 4553 4142 000a                         
0000005
$ hexdump <<< "SEBA"
0000000 4553 4142 000a                         
0000005
like image 310
Sebastian Sejzer Avatar asked Mar 02 '23 12:03

Sebastian Sejzer


1 Answers

When GNU wc gets all its input from files, it uses stat() (or fstat() for stdin) to get the sizes of the all the files in characters. From this it can determine the maximum number digits needed for each output field, and only uses that many digits.

When any of the inputs is a pipe, it's not possible to determine its size ahead of time. It defaults to 7 digits for that input.

Here-strings are implemented by copying the string to a temporary file and redirecting stdin to that file, so this case is able to use the optimized field size. But piping from echo doesn't permit this, so it gets 7-digit fields.

See the functions get_input_fstatus and compute_number_width in the GNU coreutils source.

As noted in a comment, bash 5.1 doesn't use a temporary for small here-strings or here-documents, it uses a pipe. "Small" may not be very small, it's the pipe buffer size. As explained at How big is the pipe buffer?, this defaults to 16K on Mac OS X and 64K on Linux. So you shouldn't depend on this behavior portably between bash versions.

like image 126
Barmar Avatar answered Mar 05 '23 15:03

Barmar