Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Writeln capable of supporting Unicode?

Consider this program:

{$APPTYPE CONSOLE}

begin
  Writeln('АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ');
end.

The output on my console which uses the Consolas font is:

????????Z??????????????????????????????????????

The Windows console is quite capable of supporting Unicode as evidenced by this program:

{$APPTYPE CONSOLE}

uses
  Winapi.Windows;

const
  Text = 'АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ';

var
  NumWritten: DWORD;

begin
  WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), PChar(Text), Length(Text), NumWritten, nil);
end.

for which the output is:

АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ

Can Writeln be persuaded to respect Unicode, or is it inherently crippled?

like image 606
David Heffernan Avatar asked Oct 08 '14 10:10

David Heffernan


3 Answers

Just set the console output codepage through the SetConsoleOutputCP() routine with codepage cp_UTF8.

program Project1;

{$APPTYPE CONSOLE}

uses
  System.SysUtils,Windows;
Const
  Text =  'АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ';
VAR
  NumWritten: DWORD;
begin
  ReadLn;  // Make sure Consolas font is selected
  try
    WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), PChar(Text), Length(Text), NumWritten, nil);    
    SetConsoleOutputCP(CP_UTF8);
    WriteLn;
    WriteLn('АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ');
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  ReadLn;
end.

Outputs:

АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ
АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ

WriteLn() translates Unicode UTF16 strings to the selected output codepage (cp_UTF8) internally.


Update:

The above works in Delphi-XE2 and above. In Delphi-XE you need an explicit conversion to UTF-8 to make it work properly.

WriteLn(UTF8String('АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ'));

Addendum:

If an output to the console is done in another codepage before calling SetConsoleOutputCP(cp_UTF8), the OS will not correctly output text in utf-8. This can be fixed by closing/reopening the stdout handler.

Another option is to declare a new text output handler for utf-8.

var
  toutUTF8: TextFile;
...
SetConsoleOutputCP(CP_UTF8);
AssignFile(toutUTF8,'',cp_UTF8);  // Works in XE2 and above
Rewrite(toutUTF8);
WriteLn(toutUTF8,'АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ');
like image 91
LU RD Avatar answered Nov 20 '22 12:11

LU RD


The System unit declares a variable named AlternateWriteUnicodeStringProc that allows customisation of how Writeln performs output. This program:

{$APPTYPE CONSOLE}

uses
  Winapi.Windows;

function MyAlternateWriteUnicodeStringProc(var t: TTextRec; s: UnicodeString): Pointer;
var
  NumberOfCharsWritten, NumOfBytesWritten: DWORD;
begin
  Result := @t;
  if t.Handle = GetStdHandle(STD_OUTPUT_HANDLE) then
    WriteConsole(t.Handle, Pointer(s), Length(s), NumberOfCharsWritten, nil)
  else
    WriteFile(t.Handle, Pointer(s)^, Length(s)*SizeOf(WideChar), NumOfBytesWritten, nil);
end;

var
  UserFile: Text;

begin
  AlternateWriteUnicodeStringProc := MyAlternateWriteUnicodeStringProc;
  Writeln('АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ');
  Readln;
end.

produces this output:

АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ

I'm sceptical of how I've implemented MyAlternateWriteUnicodeStringProc and how it would interact with classic Pascal I/O. However, it appears to behave as desired for output to the console.

The documentation of AlternateWriteUnicodeStringProc currently says, wait for it, ...

Embarcadero Technologies does not currently have any additional information. Please help us document this topic by using the Discussion page!

like image 31
David Heffernan Avatar answered Nov 20 '22 13:11

David Heffernan


WriteConsoleW seems to be a quite magical function.

procedure WriteLnToConsoleUsingWriteFile(CP: Cardinal; AEncoding: TEncoding; const S: string);
var
  Buffer: TBytes;
  NumWritten: Cardinal;
begin
  Buffer := AEncoding.GetBytes(S);
  // This is a side effect and should be avoided ...
  SetConsoleOutputCP(CP);
  WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), Buffer[0], Length(Buffer), NumWritten, nil);
  WriteLn;
end;

procedure WriteLnToConsoleUsingWriteConsole(const S: string);
var
  NumWritten: Cardinal;
begin
  WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), PChar(S), Length(S), NumWritten, nil);
  WriteLn;
end;

const
  Text = 'АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢѸФХѾЦЧШЩЪЫЬѢѤЮѦѪѨѬѠѺѮѰѲѴ';
begin
  ReadLn; // Make sure Consolas font is selected
  // Works, but changing the console CP is neccessary
  WriteLnToConsoleUsingWriteFile(CP_UTF8, TEncoding.UTF8, Text);
  // Doesn't work
  WriteLnToConsoleUsingWriteFile(1200, TEncoding.Unicode, Text);
  // This does and doesn't need the CP anymore
  WriteLnToConsoleUsingWriteConsole(Text);
  ReadLn;
end.

So in summary:

WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), ...) supports UTF-16.

WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), ...) doesn't support UTF-16.

My guess would be that in order to support different ANSI encodings the classic Pascal I/O uses the WriteFile call.

Also keep in mind that when used on a file instead of the console it has to work as well:

unicode text file output differs between XE2 and Delphi 2009?

That means that blindly using WriteConsole breaks output redirection. If you use WriteConsole you should fall back to WriteFile like this:

var
  NumWritten: Cardinal;
  Bytes: TBytes;
begin
  if not WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), PChar(S), Length(S),
    NumWritten, nil) then
  begin
    Bytes := TEncoding.UTF8.GetBytes(S);
    WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), Bytes[0], Length(Bytes),
      NumWritten, nil);
  end;
  WriteLn;
end;

Note that output redirection with any encoding works fine in cmd.exe. It just writes the output stream to the file unchanged.

PowerShell however expects either ANSI output or the correct preamble (/ BOM) has to be included at the start of the output (or the file will be malencoded!). Also PowerShell will always convert the output into UTF-16 with preamble.

MSDN recommends using GetConsoleMode to find out if the standard handle is a console handle, also the BOM is mentioned:

WriteConsole fails if it is used with a standard handle that is redirected to a file. If an application processes multilingual output that can be redirected, determine whether the output handle is a console handle (one method is to call the GetConsoleMode function and check whether it succeeds). If the handle is a console handle, call WriteConsole. If the handle is not a console handle, the output is redirected and you should call WriteFile to perform the I/O. Be sure to prefix a Unicode plain text file with a byte order mark. For more information, see Using Byte Order Marks.

like image 5
Jens Mühlenhoff Avatar answered Nov 20 '22 12:11

Jens Mühlenhoff