cmd.exe interacts with the user through a command-line interface. On Windows, this interface is implemented through the Win32 console. cmd.exe may take advantage of features available to native programs of its own platform.
A command interpreter is the part of a computer operating system that understands and executes commands that are entered interactively by a human being or from a program. In some operating systems, the command interpreter is called the shell.
Starting the Windows Command Shell The Windows command shell is actually an application built into the Windows operating system. CMD.exe is the command interpreter that accepts your commands and executes them in the way you want.
They use almost entirely C, C++, and C# for Windows. Some areas of code are hand tuned/hand written assembly.
We performed experiments to investigate the grammar of batch scripts. We also investigated differences between batch and command line mode.
Here is a brief overview of phases in the batch file line parser:
Phase 0) Read Line:
Phase 1) Percent Expansion:
Phase 2) Process special characters, tokenize, and build a cached command block: This is a complex process that is affected by things such as quotes, special characters, token delimiters, and caret escapes.
Phase 3) Echo the parsed command(s) Only if the command block did not begin with @
, and ECHO was ON at the start of the preceding step.
Phase 4) FOR %X
variable expansion: Only if a FOR command is active and the commands after DO are being processed.
Phase 5) Delayed Expansion: Only if delayed expansion is enabled
Phase 5.3) Pipe processing: Only if commands are on either side of a pipe
Phase 5.5) Execute Redirection:
Phase 6) CALL processing/Caret doubling: Only if the command token is CALL
Phase 7) Execute: The command is executed
Here are details for each phase:
Note that the phases described below are only a model of how the batch parser works. The actual cmd.exe internals may not reflect these phases. But this model is effective at predicting behavior of batch scripts.
Phase 0) Read Line: Read line of input through first <LF>
.
<Ctrl-Z>
(0x1A) is read as <LF>
(LineFeed 0x0A)<Ctrl-Z>
, is treated as itself - it is not converted to <LF>
Phase 1) Percent Expansion:
%%
is replaced by a single %
%*
, %1
, %2
, etc.)%var%
, if var does not exist replace it with nothing<LF>
not within %var%
expansionPhase 2) Process special characters, tokenize, and build a cached command block: This is a complex process that is affected by things such as quotes, special characters, token delimiters, and caret escapes. What follows is an approximation of this process.
There are concepts that are important throughout this phase.
<space>
<tab>
;
,
=
<0x0B>
<0x0C>
and <0xFF>
The following characters may have special meaning in this phase, depending on context: <CR>
^
(
@
&
|
<
>
<LF>
<space>
<tab>
;
,
=
<0x0B>
<0x0C>
<0xFF>
Look at each character from left to right:
<CR>
then remove it, as if it were never there (except for weird redirection behavior)^
), the next character is escaped, and the escaping caret is removed. Escaped characters lose all special meaning (except for <LF>
)."
), toggle the quote flag. If the quote flag is active, then only "
and <LF>
are special. All other characters lose their special meaning until the next quote toggles the quote flag off. It is not possible to escape the closing quote. All quoted characters are always within the same token.<LF>
always turns off the quote flag. Other behaviors vary depending on context, but quotes never alter the behavior of <LF>
.
<LF>
<LF>
is stripped<LF>
, then it is treated as a literal, meaning this process is not recursive.<LF>
not within parentheses
<LF>
is stripped and parsing of the current line is terminated.<LF>
within a FOR IN parenthesized block
<LF>
is converted into a <space>
<LF>
within a parenthesized command block
<LF>
is converted into <LF><space>
, and the <space>
is treated as part of the next line of the command block.&
|
<
or >
, split the line at this point in order to handle pipes, command concatenation, and redirection.
|
), each side is a separate command (or command block) that gets special handling in phase 5.3&
, &&
, or ||
command concatenation, each side of the concatenation is treated as a separate command.<
, <<
, >
, or >>
redirection, the redirection clause is parsed, temporarily removed, and then appended to the end of the current command. A redirection clause consists of an optional file handle digit, the redirection operator, and the redirection destination token.
@
, then the @
has special meaning. (@
is not special in any other context)
@
is removed.@
is before an opening (
, then the entire parenthesized block is excluded from the phase 3 echo.(
is not special.(
, then start a new compound statement and increment the parenthesis counter)
terminates the compound statement and decrements the parenthesis counter.)
functions similar to a REM
statement as long as it is immediately followed by a token delimiter, special character, newline, or end-of-file
^
(line concatenation is possible)@
have been stripped and redirection moved to the end).
(
functions as a command token delimiter, in addition to the standard token delimiters<LF>
as <space>
. After the IN clause is parsed, all tokens are concatenated together to form a single token.^
that ends the line, then the argument token is thrown away, and the subsequent line is parsed and appended to the REM. This repeats until there is more than one token, or the last character is not ^
.:
, and this is the first round of phase 2 (not a restart due to CALL in phase 6) then
)
, <
, >
, &
and |
no longer have special meaning. The entire remainder of the line is considered to be part of the label "command".^
continues to be special, meaning that line continuation can be used to append the subsequent line to the label.(
no longer has special meaning for the first command that follows the Unexecuted Label.|
pipe or &
, &&
, or ||
command concatenation on the line.Phase 3) Echo the parsed command(s) Only if the command block did not begin with @
, and ECHO was ON at the start of the preceding step.
Phase 4) FOR %X
variable expansion: Only if a FOR command is active and the commands after DO are being processed.
%%X
into %X
. The command line has different percent expansion rules for phase 1. This is the reason that command lines use %X
but batch files use %%X
for FOR variables.~modifiers
are not case sensitive.~modifiers
take precedence over variable names. If a character following ~
is both a modifier and a valid FOR variable name, and there exists a subsequent character that is an active FOR variable name, then the character is interpreted as a modifier.---- From this point onward, each command identified in phase 2 is processed separately.
---- Phases 5 through 7 are completed for one command before moving on to the next.
Phase 5) Delayed Expansion: Only if delayed expansion is on, the command is not in a parenthesized block on either side of a pipe, and the command is not a "naked" batch script (script name without parentheses, CALL, command concatenation, or pipe).
!
. If not, then the token is not parsed - important for ^
characters.
If the token does contain !
, then scan each character from left to right:
^
) the next character has no special meaning, the caret itself is removed!
are collapsed into a single !
!
is removed<CR>
or <LF>
)Phase 5.3) Pipe processing: Only if commands are on either side of a pipe
Each side of the pipe is processed independently and asynchronously.
%comspec% /S /D /c" commandBlock"
, so the command block gets a phase restart, but this time in command line mode.
<LF>
with a command before and after are converted to <space>&
. Other <LF>
are stripped.Phase 5.5) Execute Redirection: Any redirection that was discovered in phase 2 is now executed.
||
is used.Phase 6) CALL processing/Caret doubling: Only if the command token is CALL, or if the text before the first occurring standard token delimiter is CALL. If CALL is parsed from a larger command token, then the unused portion is prepended to the arguments token before proceeding.
/?
. If found anywhere within the tokens, then abort phase 6 and proceed to Phase 7, where the HELP for CALL will be printed.CALL
, so multiple CALL's can be stacked&
or |
(
@
IF
or FOR
is not recognized as an internal or external command.:
.:
, then
Phase 7) Execute: The command is executed
+
/
[
]
<space>
<tab>
,
;
or =
.
\
or :
+
/
[
]
<space>
<tab>
,
;
or =
/?
is detected. Most recognize /?
if it appears anywhere in the arguments. But a few commands like ECHO and SET only print help if the first argument token begins with /?
.set "name=content" ignored
--> value=content
set name="content" not ignored
--> value="content" not ignored
::
will always result in an error unless SUBST is used to define a volume for ::
::
, then the volume will be changed, it will not be treated as a label.,
, ;
, =
or +
then break the command token at the first occurrence of <space>
,
;
or =
and prepend the remainder to the argument token(s).:
, then goto 7.4::
, then this will not be reached because the preceding step will have aborted with an error unless SUBST is used to define a volume for ::
.:
, then goto 7.4::
, and SUBST is used to define a volume for ::
, and the entire command token is a valid path to an external command.:
.Works like the BatchLine-Parser, except:
Phase 1) Percent Expansion:
%*
, %1
etc. argument expansion%var%
is left unchanged.%%
. If var=content, then %%var%%
expands to %content%
.Phase 3) Echo the parsed command(s)
Phase 5) Delayed Expansion: only if DelayedExpansion is enabled
!var!
is left unchanged.Phase 7) Execute Command
::
There are many different contexts where cmd.exe parses integer values from strings, and the rules are inconsistent:
SET /A
IF
%var:~n,m%
(variable substring expansion)FOR /F "TOKENS=n"
FOR /F "SKIP=n"
FOR /L %%A in (n1 n2 n3)
EXIT [/B] n
Details for these rules may be found at Rules for how CMD.EXE parses numbers
For anyone wishing to improve the cmd.exe parsing rules, there is a discussion topic on the DosTips forum where issues can be reported and suggestions made.
Hope it helps
Jan Erik (jeb) - Original author and discoverer of phases
Dave Benham (dbenham) - Much additional content and editing
When invoking a command from a command window, tokenization of the command line arguments is not done by cmd.exe
(a.k.a. "the shell"). Most often the tokenization is done by the newly formed processes' C/C++ runtime, but this is not necessarily so -- for example, if the new process was not written in C/C++, or if the new process chooses to ignore argv
and process the raw commandline for itself (e.g. with GetCommandLine()). At the OS level, Windows passes command lines untokenized as a single string to new processes. This is in contrast to most *nix shells, where the shell tokenizes arguments in a consistent, predictable way before passing them to the newly formed process. All this means that you may experience wildly divergent argument tokenization behavior across different programs on Windows, as individual programs often take argument tokenization into their own hands.
If it sounds like anarchy, it kind of is. However, since a large number of Windows programs do utilize the Microsoft C/C++ runtime's argv
, it may be generally useful to understand how the MSVCRT tokenizes arguments. Here is an excerpt:
The Microsoft "batch language" (.bat
) is no exception to this anarchic environment, and it has developed its own unique rules for tokenization and escaping. It also looks like cmd.exe's command prompt does do some preprocessing of the command line argument (mostly for variable substitution and escaping) before passing the argument off to the newly executing process. You can read more about the low-level details of the batch language and cmd escaping in the excellent answers by jeb and dbenham on this page.
Let's build a simple command line utility in C and see what it says about your test cases:
int main(int argc, char* argv[]) {
int i;
for (i = 0; i < argc; i++) {
printf("argv[%d][%s]\n", i, argv[i]);
}
return 0;
}
(Notes: argv[0] is always the name of the executable, and is omitted below for brevity. Tested on Windows XP SP3. Compiled with Visual Studio 2005.)
> test.exe "a ""b"" c"
argv[1][a "b" c]
> test.exe """a b c"""
argv[1]["a b c"]
> test.exe "a"" b c
argv[1][a" b c]
And a few of my own tests:
> test.exe a "b" c
argv[1][a]
argv[2][b]
argv[3][c]
> test.exe a "b c" "d e
argv[1][a]
argv[2][b c]
argv[3][d e]
> test.exe a \"b\" c
argv[1][a]
argv[2]["b"]
argv[3][c]
Here is an expanded explanation of Phase 1 in jeb's answer (Valid for both batch mode and command line mode).
Phase 1) Percent Expansion
Starting from left, scan each character for %
or <LF>
. If found then
<LF>
)
<LF>
then
<LF>
onward<CR>
)%
, so proceed to 1.1%
) skipped if command line mode
%
then%%
with single %
and continue scan*
and command extensions are enabled then%*
with the text of all command line arguments (Replace with nothing if there are no arguments) and continue scan. <digit>
then%<digit>
with argument value (replace with nothing if undefined) and continue scan.~
and command extensions are enabled then
<digit>
then%~[modifiers]<digit>
with modified argument value (replace with nothing if not defined or if specified $PATH: modifier is not defined) and continue scan.<digit>
%
or end of buffer, and call them VAR (may be an empty list)
%
then
%VAR%
with value of VAR and continue scan%VAR%
and continue scan %
:
or end of buffer, and call them VAR (may be an empty list). If VAR breaks before :
and the subsequent character is %
then include :
as the last character in VAR and break before %
.
%
then
%VAR%
with value of VAR and continue scan %VAR%
and continue scan :
then
%VAR:
and continue scan. ~
then
[integer][,[integer]]%
then%VAR:~[integer][,[integer]]%
with substring of value of VAR (possibly resulting in empty string) and continue scan.=
or *=
then[*]search=[replace]%
, where search may include any set of characters except =
, and replace may include any set of characters except %
, then%VAR:[*]search=[replace]%
with value of VAR after performing search and replace (possibly resulting in empty string) and continue scan%
and continue scan starting with the next character after the %
%
and continue scan starting with the next character after the preserved leading %
The above helps explain why this batch
@echo off
setlocal enableDelayedExpansion
set "1var=varA"
set "~f1var=varB"
call :test "arg1"
exit /b
::
:test "arg1"
echo %%1var%% = %1var%
echo ^^^!1var^^^! = !1var!
echo --------
echo %%~f1var%% = %~f1var%
echo ^^^!~f1var^^^! = !~f1var!
exit /b
Gives these results:
%1var% = "arg1"var
!1var! = varA
--------
%~f1var% = P:\arg1var
!~f1var! = varB
Note 1 - Phase 1 occurs prior to the recognition of REM statements. This is very important because it means even a remark can generate a fatal error if it has invalid argument expansion syntax or invalid variable search and replace syntax!
@echo off
rem %~x This generates a fatal argument expansion error
echo this line is never reached
Note 2 - Another interesting consequence of the % parsing rules: Variables containing : in the name can be defined, but they cannot be expanded unless command extensions are disabled. There is one exception - a variable name containing a single colon at the end can be expanded while command extensions are enabled. However, you cannot perform substring or search and replace operations on variable names ending with a colon. The batch file below (courtesy of jeb) demonstrates this behavior
@echo off
setlocal
set var=content
set var:=Special
set var::=double colon
set var:~0,2=tricky
set var::~0,2=unfortunate
echo %var%
echo %var:%
echo %var::%
echo %var:~0,2%
echo %var::~0,2%
echo Now with DisableExtensions
setlocal DisableExtensions
echo %var%
echo %var:%
echo %var::%
echo %var:~0,2%
echo %var::~0,2%
Note 3 - An interesting outcome of the order of the parsing rules that jeb lays out in his post: When performing find and replace with delayed expansion, special characters in both the find and replace terms must be escaped or quoted. But the situation is different for percent expansion - the find term must not be escaped (though it can be quoted). The percent replace string may or may not require escape or quote, depending on your intent.
@echo off
setlocal enableDelayedExpansion
set "var=this & that"
echo %var:&=and%
echo "%var:&=and%"
echo !var:^&=and!
echo "!var:&=and!"
Here is an expanded, and more accurate explanation of phase 5 in jeb's answer (Valid for both batch mode and command line mode)
Phase 5) Delayed Expansion
This phase is skipped if any of the following conditions apply:
CALL
, parenthesized block, any form of command concatenation (&
, &&
or ||
), or a pipe |
.The delayed expansion process is applied to tokens independently. A command may have multiple tokens:
for ... in(TOKEN) do
if defined TOKEN
if exists TOKEN
if errorlevel TOKEN
if cmdextversion TOKEN
if TOKEN comparison TOKEN
, where comparison is one of ==
, equ
, neq
, lss
, leq
, gtr
, or geq
No change is made to tokens that do not contain !
.
For each token that does contain at least one !
, scan each character from left to right for ^
or !
, and if found, then
!
or ^
literals
^
then
^
!
, then
!
or <LF>
, and call them VAR (may be an empty list)
!
then
!VAR!
with value of VAR and continue scan!VAR!
and continue scan!
, :
, or <LF>
, and call them VAR (may be an empty list). If VAR breaks before :
and the subsequent character is !
then include :
as the last character in VAR and break before !
!
then
!VAR!
with value of VAR and continue scan!VAR!
and continue scan :
then
!VAR:
and continue scan~
then
[integer][,[integer]]!
then Replace !VAR:~[integer][,[integer]]!
with substring of value of VAR (possibly resulting in empty string) and continue scan.[*]search=[replace]!
, where search may include any set of characters except =
, and replace may include any set of characters except !
, then!VAR:[*]search=[replace]!
with value of VAR after performing search and replace (possibly resulting in an empty string) and continue scan!
!
!
As pointed out, commands are passed the entire argument string in μSoft land, and it is up to them to parse this into separate arguments for their own use. There is no consistencty in this between different programs, and therefore there is no one set of rules to describe this process. You really need to check each corner case for whatever C library your program uses.
As far as the system .bat
files go, here is that test:
c> type args.cmd
@echo off
echo cmdcmdline:[%cmdcmdline%]
echo 0:[%0]
echo *:[%*]
set allargs=%*
if not defined allargs goto :eof
setlocal
@rem Wot about a nice for loop?
@rem Then we are in the land of delayedexpansion, !n!, call, etc.
@rem Plays havoc with args like %t%, a"b etc. ugh!
set n=1
:loop
echo %n%:[%1]
set /a n+=1
shift
set param=%1
if defined param goto :loop
endlocal
Now we can run some tests. See if you can figure out just what μSoft are trying to do:
C>args a b c
cmdcmdline:[cmd.exe ]
0:[args]
*:[a b c]
1:[a]
2:[b]
3:[c]
Fine so far. (I'll leave out the uninteresting %cmdcmdline%
and %0
from now on.)
C>args *.*
*:[*.*]
1:[*.*]
No filename expansion.
C>args "a b" c
*:["a b" c]
1:["a b"]
2:[c]
No quote stripping, though quotes do prevent argument splitting.
c>args ""a b" c
*:[""a b" c]
1:[""a]
2:[b" c]
Consecutive double quotes causes them to lose any special parsing abilities they may have had. @Beniot's example:
C>args "a """ b "" c"""
*:["a """ b "" c"""]
1:["a """]
2:[b]
3:[""]
4:[c"""]
Quiz: How do you pass the value of any environment var as a single argument (i.e., as %1
) to a bat file?
c>set t=a "b c
c>set t
t=a "b c
c>args %t%
1:[a]
2:["b c]
c>args "%t%"
1:["a "b]
2:[c"]
c>Aaaaaargh!
Sane parsing seems forever broken.
For your entertainment, try adding miscellaneous ^
, \
, '
, &
(&c.) characters to these examples.
You have some great answers above already, but to answer one part of your question:
set a =b, echo %a %b% c% → bb c%
What is happening there is that because you have a space before the =, a variable is created called %a<space>%
so when you echo %a %
that is evaluated correctly as b
.
The remaining part b% c%
is then evaluated as plain text + an undefined variable % c%
, which should be echoed as typed, for me echo %a %b% c%
returns bb% c%
I suspect that the ability to include spaces in variable names is more of an oversight than a planned 'feature'
FOR
-Loop Meta-Variable ExpansionThis is an extended explanation of Phase 4) in the accepted answer (applicable for both batch file mode and command line mode). Of course a for
command must be active. The following describes the processing of the command line portion after the do
clause. Note that in batch file mode, %%
has already been converted to %
due to the foregoing immediate %
-expansion phase (Phase 1)).
%
-sign, beginning from the left up to the end of the line; if one is found, then:
~
; if yes, then:
fdpnxsatz
(even multiple times each) that are preceding a character that defines a for
variable reference or a $
-sign; if such a $
-sign is encountered, then:
:
1; if found, then:
:
, use it as a for
variable reference and expand as expected, unless it is not defined, then do not expand and continue scan at that character position;:
is the last character, cmd.exe
will crash!
:
is found) do not expand anything;$
-sign is encountered) expand the for
variable using all the modifiers, unless it is not defined, then do not expand and continue scan at that character position;~
is found or Command Extensions are disabled) check the next character:
%
, do not expand anything and go back to the beginning of the scan at this character position2;for
variable reference and expand, unless such is not defined, then do not expand;1) The string between $
and :
is considered as the name of an environment variable, which may even be empty; since an environment variable cannot have an empty name, the behaviour is just the same as for an undefined environment variable.
2) This implies that a for
meta-variable named %
cannot be expanded without a ~
-modifier.
Original source: How to safely echo FOR variable %%~p followed by a string literal
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With