Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why FOR /F sets empty values for repeated numbers in the rest of tokens?

Tags:

batch-file

Not sure if the question is clear enough so here's an example:

:::this prints - 1:[i] 2:[] 3:[] 4:[] 5:[] 6:[] 7:[]
for /f "tokens=1,1,1,1,1,1,1" %%a in ("i ii iii iv v vi vii") do (
    @echo 1:[%%a] 2:[%%b] 3:[%%c] 4:[%%d] 5:[%%e] 6:[%%f] 7:[%%g]
)

:::this prints - 1:[i] 2:[ii] 3:[iii] 4:[iv] 5:[] 6:[] 7:[%g]
for /f "tokens=2,3,1-4" %%a in ("i ii iii iv v vi vii") do (
    @echo 1:[%%a] 2:[%%b] 3:[%%c] 4:[%%d] 5:[%%e] 6:[%%f] 7:[%%g]
)

:::this prints - 1:[i] 2:[ii] 3:[iii] 4:[] 5:[] 6:[] 7:[%g]
for /f "tokens=1-3,1-3," %%a in ("i ii iii iv v vi vii") do (
    @echo 1:[%%a] 2:[%%b] 3:[%%c] 4:[%%d] 5:[%%e] 6:[%%f] 7:[%%g]
)

In brief if there's a repeated numbers in the list of tokens (doesn't matter if they are in the ranges like n-m or set one by one with commas ) the same number of the left accessed tokens have empty values.

Nowhere this behavior is documented (or at least I didn't found such thing).Here's FOR help that concerns tokens:

tokens=x,y,m-n  - specifies which tokens from each line are to
                  be passed to the for body for each iteration.
                  This will cause additional variable names to
                  be allocated.  The m-n form is a range,
                  specifying the mth through the nth tokens.  If
                  the last character in the tokens= string is an
                  asterisk, then an additional variable is
                  allocated and receives the remaining text on
                  the line after the last token parsed.

This is testes on Win8x64 so I'm not even sure this will happen on all the range of Windows machines.

EDIT: Despite the accesible tokens are limited to 31 with this I can create more empty tokens :

setlocal disableDelayedExpansion
for /f "tokens=1-31,1-31,1-31" %%! in (
"33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 "
) do (
 echo 1:[%%!-!]  30:[%%?-?] 31:[%%@-@] 32:[%%A-A] 33:[%%B-B] 34:[%%C-C] 35:[%%D-D] 36:[%%E-E] 37:[%%F-F] 38:[%%G-G] 90:[%%{-{] 
)

edit. the maximum of the empty tokens is 250 (not sure how the extended ascii characters will be displayed between 0x02 and 0xFB):

@echo off
for /f "tokens=1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31" %% in (
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1") do (
    echo 0x02-%%- 0x07-%%- 0xFE-%%ю-  0xFB-%%ы- 0xFA-%%ъ- 
)
like image 492
npocmaka Avatar asked Mar 19 '23 10:03

npocmaka


2 Answers

While i have no real idea of why the for command behaves as it does, there are some simple rules that match the for behaviour. And here we are talking only about the token clause. delims, eol, skip and usebackq other day

Step 1 - The tokens clause is found. The clause is parsed, and for each range requested (only one, start-end, *) it is determined if it is valid. It is discarded if it is not a valid request (not in the range 1-31 or not *) but if it is a valid request, for each element requested a "variable" is allocated (probably a table) to later hold the data retrieved for this token. At the same time, a "set" is defined (maybe a bitmap mask), setting that the token number x (the number used to identify the token in the tokens clause) will be retrieved. The same token can be requested several times, but in the "set" (or bitmask, ...) the only effect is to mark again that the token x will be retrieved.

Now the "set" contains the position of the valid (1-31, *) tokens that were requested.

Once after the parser ends to process the for configuration, the input file is readed into memory, or the command is executed to retrieve all its output into memory or the literal string is declared as the input buffer.

Step 2 - Prepare line parse. The table to hold the token data is initialized to blanks and a pointer set to the first position in the table (the first token). If the line has not been discarded by skip, eol or because it is empty, the tokenizer will scan the input buffer for tokens, else, search the end of the line and repeat step 2 for the new line found.

Step 3 - Parse the input buffer. Until the end of a line is reached, for each token found in the line its position, if in range (1-31 or * token), is checked against the "set" to determine if it has been requested or not (if this token number is in the set or if the * token is being handled). If it has been requested, its data is included in the "table"? in the position indicate by the table pointer, the pointer incremented and the tokenizer continues repeating step 3 until the end of the line is reached.

Step 4 - The end of the line has been reached. If any token has been retrieved or if the only token requested was * (test for /f "tokens=*" %a in (" ") do echo %a), execute the code in the do clause.

Step 5 - If the excution of the for has not been canceled and the end of the buffer has not been reached, there are more lines to process, back to step 2.

This set of steps reproduce all the observed behaviours in the question, but does not prove if this is the way the for command is coded.

Now, let's check it against the code in the question

:::this prints - 1:[i] 2:[] 3:[] 4:[] 5:[] 6:[] 7:[]
for /f "tokens=1,1,1,1,1,1,1" %%a in ("i ii iii iv v vi vii") do (
    @echo 1:[%%a] 2:[%%b] 3:[%%c] 4:[%%d] 5:[%%e] 6:[%%f] 7:[%%g]
)

7 requested tokens, so 7 positions in the table that will be passed to the do code, but the only token that matches the "set" is the number 1

:::this prints - 1:[i] 2:[ii] 3:[iii] 4:[iv] 5:[] 6:[] 7:[%g]
for /f "tokens=2,3,1-4" %%a in ("i ii iii iv v vi vii") do (
    @echo 1:[%%a] 2:[%%b] 3:[%%c] 4:[%%d] 5:[%%e] 6:[%%f] 7:[%%g]
)

6 requested tokens, 6 position in the table of tokens, and the "set" will only match 1,2,3,4

:::this prints - 1:[i] 2:[ii] 3:[iii] 4:[] 5:[] 6:[] 7:[%g]
for /f "tokens=1-3,1-3," %%a in ("i ii iii iv v vi vii") do (
    @echo 1:[%%a] 2:[%%b] 3:[%%c] 4:[%%d] 5:[%%e] 6:[%%f] 7:[%%g]
)

6 requested tokens, 6 positions in the table of tokens, and the "set" will only match 1,2,3

setlocal disableDelayedExpansion
for /f "tokens=1-31,1-31,1-31" %%! in (
"33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 "
) do (
 echo 1:[%%!-!]  30:[%%?-?] 31:[%%@-@] 32:[%%A-A] 33:[%%B-B] 34:[%%C-C] 35:[%%D-D] 36:[%%E-E] 37:[%%F-F] 38:[%%G-G] 90:[%%{-{] 
)

93 requested tokens, 93 positions allocated in the table of tokens, the "set" will only match elements 1-31

edited more cases added to the question

the maximum of the empty tokens is 250

@echo off
for /f "tokens=1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31,1-31" %% in (
"1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1") do (
    echo 0x02-%%- 0x07-%%- 0xFE-%%ю-  0xFB-%%ы- 0xFA-%%ъ- 
)

No, you can request as much tokens as you can. I tested with 1625 1-30 and an aditional 31 (to ensure the parser keeps working), and it is handled without problems. Probably the limit is the line lengh. You can request up to 50530 (aprox) tokens (repeating 1-31,... to reach the line limit), but you are limited to get valid data for the 31 first tokens and blank data for the rest of the elements in the storage table, having to retrieve elements using a single character in the for replaceable parameter. Using %%^A (0x01, Alt-001) as the for replaceable parameter, you can request up to %%ÿ (0xFF, Alt-255)

like image 86
MC ND Avatar answered Apr 08 '23 05:04

MC ND


I also don't have an explanation, but I do have an additional effect.

The * "token" is still accepted, but it will always be empty (dysfunctional) if there is at least one duplicate token request.

@echo off
for /f "tokens=1,1,2*" %%a in ("1 2 3 4") do (
  echo a=%%a
  echo b=%%b
  echo c=%%c
  echo d=%%d
  echo e=%%e
)

-- OUTPUT --

a=1
b=2
c=
d=
e=%e
like image 20
dbenham Avatar answered Apr 08 '23 07:04

dbenham