Can you use a UTF-8 string as the Arguments for a StartInfo?
I am trying to pass a UTF-8 (in this case a Japanese string) to an application as a console argument.
Something like this (this is just an example! (cmd.exe would be a custom app))
var process = new System.Diagnostics.Process();
process.StartInfo.Arguments = "/K \"echo これはテストです\"";
process.StartInfo.FileName = "cmd.exe";
process.StartInfo.UseShellExecute = true;
process.Start();
process.WaitForExit();
Executing this seems to loose the UTF-8 string and all the target application sees is "echo ?????????"
When executing this command directly on the command line (by pasting the arguments) the target application receives the string correctly even though the command line itself doesn't seem to display it correctly.
Do I need to do anything special to enable UTF-8 support in the arguments or is this just not supported?
Programs receive their command lines in UTF-16, the same encoding as .NET strings:
Arguments = "/U /K \"echo これはテストです> output.txt\"";
It is the console window that cannot display characters outside of it's current codepage/selected font. However, I am assuming that you don't want to call echo, so this depends entirely on how the program you are calling is written.
Some background info: C or C++ programs that use the 'narrow' (system code page) entry points, eg main(int argc, char** argv)
, rather than the 'wide' (UTF-16) entry points, wmain(int argc, wchar_t** argv)
, are called by a stub that converts the commandline to the system codepage - which cannot be UTF-8.
By far the best option is to change the program to use a wide entrypoint, and simply get the same UTF-16 as you had in your .NET string. If that is not possible, then one trick you could try is to pass it a UTF-16 commandline that when converted to the system codepage is UTF-8 for the characters you want it to use:
Arguments = Encoding.Default.GetString(Encoding.UTF8.GetBytes(args));
Caveat Coder: Don't be surprised if this goes horribly wrong on your or someone else's machine, it depends on every possible byte being valid in the current system codepage, the system codepage not being different from when your program was started, the program you are running not using the data to any encoding dependent Windows function (those with A, W suffixed versions), and so on.
It completely depends on the program you are trying to start. The Process class fully supports Unicode, as does the operating system. But the program might be old and use 8-bit characters. It will use GetCommandLineA() to retrieve the command line arguments, the ANSI version of the native Unicode GetCommandLineW() API function. And that translates the Unicode string to 8-bit chars using the system default code page as configured in Control Panel + Regional and Language Options, Language for Non-Unicode Programs. WideCharToMultiByte() using CP_ACP.
If that is not the Japanese code page, that translation produces question marks since the Japanese glyphs only have a code in the Japanese code page. Switching the system code page isn't usually very desirable for non-Japanese speakers. Utf8 certainly won't work, the program isn't going to expect them. Consider running this program in a virtual machine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With