In a C program I'm using wprintf to print Unicode (UTF-16) text in a Windows console. This works fine, but when the output of the program is redirected to a log file, the log file has a corrupted UTF-16 encoding. When redirection is done in a Windows Command Prompt, all line breaks are encoded as a narrow ASCII line break (0d0a). When redirection is done in PowerShell, null characters are inserted.
Is it possible to redirect the output to a proper UTF-16 log file?
Example program:
#include <stdio.h>
#include <windows.h>
#include <fcntl.h>
#include <io.h>
int main () {
int prevmode;
prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
_setmode(_fileno(stdout), prevmode);
return 0;
}
Redirecting the output in Command Prompt. See the 0d0a which should be 0d00 0a00:
c:\test>.\testu16.exe > o.txt
c:\test>xxd o.txt
0000000: 6f00 6e00 6500 0d0a 0074 0077 006f 000d o.n.e....t.w.o..
0000010: 0a00 7400 6800 7200 6500 6500 0d0a 00 ..t.h.r.e.e....
Redirecting the output in PowerShell. See all the 0000 inserted.
PS C:\test> .\testu16.exe > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00 ..o...n...e.....
0000010: 0a00 0000 7400 0000 7700 0000 6f00 0000 ....t...w...o...
0000020: 0d00 0a00 0000 7400 0000 6800 0000 7200 ......t...h...r.
0000030: 0000 6500 0000 6500 0000 0d00 0a00 0000 ..e...e.........
0000040: 0d00 0a00 ....
I got this answer from Hans Passant. Thanks Hans.
The wrong line breaks are an effect of the buffering of stdout. We need to flush the stream before we set the mode back to the original mode.
prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
fflush(stdout); /* flush stream */
_setmode(_fileno(stdout), prevmode);
Redirecting the output in Command Prompt (cmd.exe) creates a correct UTF-16 file, without BOM.
c:\test>.\testu16 > o.txt
c:\test>xxd o.txt
0000000: 6f00 6e00 6500 0d00 0a00 7400 7700 6f00 o.n.e.....t.w.o.
0000010: 0d00 0a00 7400 6800 7200 6500 6500 0d00 ....t.h.r.e.e...
0000020: 0a00 ..
In powershell the output is still wrong.
PS C:\test> .\testu16 > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00 ..o...n...e.....
0000010: 0a00 0000 0d00 0a00 0000 7400 0000 7700 ..........t...w.
0000020: 0000 6f00 0000 0d00 0a00 0000 0d00 0a00 ..o.............
0000030: 0000 7400 0000 6800 0000 7200 0000 6500 ..t...h...r...e.
0000040: 0000 6500 0000 0d00 0a00 0000 0d00 0a00 ..e.............
0000050: 0000 0d00 0a00 ......
This is because PowerShell doesn't keep the stream untouched. It tries to interpret it and convert it to UTF-16. It guessed that the input stream encoding was ANSI. PowerShell added an UTF-16 BOM and the rest is double encoded UTF-16. This explains the extra zeros.
Even using out-file and specifying the encoding doesn't help.
PS C:\test> .\testu16.exe | out-file p.txt -encoding unicode
PS C:\test> xxd p.txt
0000000: fffe 6f00 0000 6e00 0000 6500 0000 0d00 ..o...n...e.....
0000010: 0a00 0000 0d00 0a00 0000 7400 0000 7700 ..........t...w.
0000020: 0000 6f00 0000 0d00 0a00 0000 0d00 0a00 ..o.............
0000030: 0000 7400 0000 6800 0000 7200 0000 6500 ..t...h...r...e.
0000040: 0000 6500 0000 0d00 0a00 0000 0d00 0a00 ..e.............
0000050: 0000 0d00 0a00 ......
PowerShell needs to be informed about the encoding, which is done by first printing an UTF-16 BOM:
prevmode = _setmode(_fileno(stdout), _O_U16TEXT);
fwprintf(stdout, L"\xfeff"); /* UTF-16LE BOM */
fwprintf(stdout,L"one\n");
fwprintf(stdout,L"two\n");
fwprintf(stdout,L"three\n");
fflush(stdout); /* flush stream */
_setmode(_fileno(stdout), prevmode);
Now we get a correct UTF-16 file.
PS C:\test> .\testu16 > p.txt
PS C:\test> xxd p.txt
0000000: fffe 6f00 6e00 6500 0d00 0a00 7400 7700 ..o.n.e.....t.w.
0000010: 6f00 0d00 0a00 7400 6800 7200 6500 6500 o.....t.h.r.e.e.
0000020: 0d00 0a00
">" will always redirect your console UTF16 as printable "ASCII", even if you put a BOM on your output or use prevmode = _setmode(_fileno(stdout), _O_BINARY);
. I have the same problem with windows7 there is no way to do this with fwprintf.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With