Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do some commands process lines of redirected STDIN data which are already consumed by other commands?

Supposing we have the following code snippet with a text file sample.txt redirected into STDIN:

@echo off
< "sample.txt" (
    set /P "ONE="
    set /P "TWO="
    findstr /R "^"
)
echo %ONE%, %TWO%

...and the content of the related text file sample.txt:

first
second
third
fourth

The output returned on the console is going to be this, which is exactly what I expect (lines first and second are consumed by set /P, hence findstr receives and processes the remaining lines):

third
fourth
first, second

The same output is achieved when findstr /R "^" is replaced by sort /R.

However, when replacing the findstr command line by find /V "" or by more, the output will be:

first
second
third
fourth
first, second

It seems that although set /P already consumed the lines first and second which is proved by the lastly output line, find and also more still receive the entire redirected data.

Why is this, what causes this behaviour? Is there a way to force find or more to receive only the remaining redirected data that has not already been processed by a preceding command?

(The behaviour is the same when redirecting the output data STDOUT to a text file. Also when executing a command line similar to the above batch code in cmd directly, nothing changes.)

like image 489
aschipfl Avatar asked May 24 '16 20:05

aschipfl


2 Answers

Why do some commands process lines of redirected STDIN data which are already consumed by other commands?

Because some commands/programs rewind stdin. You can try this:

@echo off
< "sample.txt" (
    set /P "ONE="
    set /P "TWO="
    more +2
)
echo %ONE%, %TWO%

Result:

    third
    fourth
    first, second

The more +2 skips the first two lines of the file.

like image 166
jwdonahue Avatar answered Nov 06 '22 08:11

jwdonahue


Well, the spot-on answer to the question as to why commands behave the way they do lies in Aacini's comment: »because such commands were programmed this way«.

Anyway, after quite some time, I want to collect my findings and eventually present a new work-around I recently found.

There are only a few commands that seem not to reset the data pointer, and each has got its pros and cons:

  1. The usage of findstr to return the remainder of the data is already demonstrated in the question. There is the problem that findstr may hang when redirected input data is not terminated by a final line-break: What are the undocumented features and limitations of the Windows FINDSTR command?

  2. pause does not reset the data pointer (and this is in fact the reason why I wanted to have it mentioned here), independent on whether the data come from input redirection or from a pipe, but it does not provide the consumed character by any means, unfortunately.

  3. set /P is fine for reading single lines that are not longer than about 1 Kbytes, so for returning the remainder of redirected data you will need some kind of loop:

     @echo off
     rem // Count total number of available lines in advance:
     for /F %%C in ('^< "sample.txt" find /C /V ""') do set "COUNT=%%C"
     < "sample.txt" (
          set /P "ONE="
          set /P "TWO="
          rem /* Loop here to return the rest; `3` is `1 + 2`, where `2`
          rem    is the hard-coded number of lines already handled; you can
          rem    just use `1` here, which will cause read attempty beyond
          rem    the end of data, causing empty lines to be returned: */
          for /L %%N in (3,1,%COUNT%) do (
              rem // Replace `&&` by `&` to NOT skip empty lines:
              set "LINE=" & set /P "LINE=" && call echo(%%LINE%%
          )
     )
     echo %ONE%, %TWO%
    

    Note that set /P cannot be used within pipes: Piping into SET /P fails due to uninitialised data pointer?

  4. Finally, sort can be used to return the remainder. To prevent it from jumbling the lines of text, use the character position option /+n and set n to a number beyond the actual line lengths:

     @echo off
     set "ONE="
     set "TWO="
     < "sample.txt" (
         set /P "ONE="
         set /P "TWO="
         rem /* `sort` with the sort position set beyond the lines of text seems to
         rem    simply revert the current sort order; another reversion restores the
         rem    original sort order; I set the sort position just beyond the maximum
         rem    record or line length, which I set to the greatest possible value: */
         sort /+65536 /REC 65535 | sort /+65536 /REC 65535
     )
     echo %ONE%, %TWO%
    

    I set the record or line length (/REC) to the greatest possible value as it defaults to 4096. Note that the minimum value is actually 128 in case you specify something less. Also note that line-breaks are regarded for the count as well.

like image 35
aschipfl Avatar answered Nov 06 '22 07:11

aschipfl