I'm working on some path-parsing C++ code and I've been experimenting with a lot of the Windows APIs for this. Is there a difference between PathGetArgs
/PathRemoveArgs
and a slightly-massaged CommandLineToArgvW
?
In other words, aside from length/cleanness, is this:
std::wstring StripFileArguments(std::wstring filePath)
{
WCHAR tempPath[MAX_PATH];
wcscpy(tempPath, filePath.c_str());
PathRemoveArgs(tempPath);
return tempPath;
}
different from this:
std::wstring StripFileArguments(std::wstring filePath)
{
LPWSTR* argList;
int argCount;
std::wstring tempPath;
argList = CommandLineToArgvW(filePath.c_str(), &argCount);
if (argCount > 0)
{
tempPath = argList[0]; //ignore any elements after the first because those are args, not the base app
LocalFree(argList);
return tempPath;
}
return filePath;
}
and is this
std::wstring GetFileArguments(std::wstring filePath)
{
WCHAR tempArgs[MAX_PATH];
wcscpy(tempArgs, filePath.c_str());
wcscpy(tempArgs, PathGetArgs(tempArgs));
return tempArgs;
}
different from
std::wstring GetFileArguments(std::wstring filePath)
{
LPWSTR* argList;
int argCount;
std::wstring tempArgs;
argList = CommandLineToArgvW(filePath.c_str(), &argCount);
for (int counter = 1; counter < argCount; counter++) //ignore the first element (counter = 0) because that's the base app, not args
{
tempArgs = tempArgs + TEXT(" ") + argList[counter];
}
LocalFree(argList);
return tempArgs;
}
? It looks to me like PathGetArgs
/PathRemoveArgs
just provide a cleaner, simpler special-case implementation of the CommandLineToArgvW
parsing, but I'd like to know if there are any corner cases in which the APIs will behave differently.
The functions are similar but not exactly the same - mostly relating to how quoted strings are handled.
PathGetArgs
returns a pointer to the first character following the first space in the input string. If a quote character is encountered before the first space, another quote is required before the function will start looking for spaces again. If no space is found the function returns a pointer to the end of the string.
PathRemoveArgs
calls PathGetArgs
and then uses the returned pointer to terminate the string. It will also strip a trailing space if the first space encountered happened to be at the end of the line.
CommandLineToArgvW
takes the supplied string and splits it into an array. It uses spaces to delineate each item in the array. The first item in the array can be quoted to allow spaces. The second and subsequent items can also be quoted, but they support slightly more complex processing - arguments can also include embedded quotes by prepending them with a backslash. For example:
"c:\program files\my app\my app.exe" arg1 "argument 2" "arg \"number\" 3"
This would produce an array with four entries:
argv[0]
- c:\program files\my app\my app.exe
argv[1]
- arg1
argv[2]
- argument 2
argv[3]
- arg "number" 3
See the CommandLineToArgVW
docs for a full description of the parsing rules, including how you can have embedded backslashes as well as quotes in the arguments.
Yes I've observed a different behaviour with the current SDK (VS2015 Update 3 + Windows 1607 Anniversary SDK with SDK version set to 8.1):
Calling CommandLineToArgvW with an empty lpCmdLine (what you get from wWinMain when no arguments were passed) returns the program path and filename, which will be split-up on every space. But this was not specified in the parameter, it must have done that itself but failed to think about ignoring spacing that path itself:
lpCmdLine = ""
argv[0] = C:\Program
argv[1] = Files\Vendor\MyProgram.exe
Calling CommandLineToArgvW with lpCmdLine containing parameters, does not include the program path and name, so works as expected (so long as there are no further spaces in the parameters...):
lpCmdLine = "One=1 Two=\"2\""
argv[0] = One=1
argv[1] = Two=2
Note it also strips any other quotes inside the parameters when passed.
CommandLineToArgvW doesn't like the first parameter in the format Text=\"Quoted spaces\"
so if you try to pass lpCmdLine to it directly it incorrectly splits the key=value pairs if they have spaces:
lpCmdLine = "One=\"Number One\" Two=\"Number Two\""
argv[0] = One=\"Number
argv[1] = One\"
argv[2] = Two=\"Number
argv[3] = Two\"
It's kind of documented here:
https://msdn.microsoft.com/en-us/library/windows/desktop/bb776391(v=vs.85).aspx
But this kind of behaviour with spaces in the program path was not expected. It seems like a bug to me. I'd prefer the same data to be processed in both situations. Because if I really want the path to the executable I'd call GetCommandLineW() instead.
The only sensible consistent solution in my opinion is to totally ignore lpCmdLine and call GetCommandLineW(), pass the results to CommandLineToArgvW() then skip the first parameter if you are not interested in the program path. That way, all combinations are supported, i.e. path with and without spaces, parameters with nested quotes with and without spaces.
int argumentCount;
LPWSTR commandLine = GetCommandLineW();
LPWSTR *arguments = CommandLineToArgvW(commandLine, &argumentCount);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With