Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most simple way to check if a string may convert to AnsiString safely in XE4 and above?

Tags:

unicode

delphi

In Delphi XE4 and above, we may write something like:

function TestAnsiCompatible(const aStr: string): Boolean;
begin
end;

string in Delphi XE4 is declared as UnicodeString. It may hold a unicode string.

If we do some type conversion:

function TestAnsiCompatible(const aStr: string): Boolean;
var a: AnsiString;
begin
  a := aStr;
  Result := a = aStr;
end;

Some compiler warnings should prompt:

[dcc32 Warning]: W1058 Implicit string cast with potential data loss from 'string' to 'AnsiString'
[dcc32 Warning]: W1057 Implicit string cast from 'AnsiString' to 'string'

Is there a much simple and neat way to test if aStr is fully compatible with AnsiString? Or we shall check character by characters:

function TestAnsiCompatible(const aStr: string): Boolean;
var C: Char;
begin
  Result := True;
  for C in aStr do begin
    if C > #127 then begin
      Result := False;
      Break;
    end;
  end;
end;
like image 700
Chau Chee Yang Avatar asked Jun 22 '14 14:06

Chau Chee Yang


2 Answers

All you have to do is type-cast away the warnings:

function TestAnsiCompatible(const aStr: string): Boolean;
var
  a: AnsiString;
begin
  a := AnsiString(aStr);
  Result := String(a) = aStr;
end;

Which can be simplified to this:

function TestAnsiCompatible(const aStr: string): Boolean;
begin
  Result := String(AnsiString(aStr)) = aStr;
end;
like image 71
Remy Lebeau Avatar answered Oct 12 '22 22:10

Remy Lebeau


I used to check if String(a) = AnsiString(a), until I had a user who had transferred data from one PC to another, and that had a different codepage. Then the data could not be read back properly. Then I changed my definition of "safe" to "string is code page 1252" (as this is the region where most of my users are). Then when reading back my data, I know I have to convert the string back from code page 1252.

function StringIs1252(const S: UnicodeString): Boolean;
// returns True if a string is in codepage 1252 (Western European (Windows))
// Cyrillic is 1251
const
  WC_NO_BEST_FIT_CHARS = $00000400;
var
  UsedDefaultChar: BOOL;   // not Boolean!!
  Len: Integer;
begin
  if Length(S) = 0 then
    Exit(True);
  UsedDefaultChar := False;
  Len := WideCharToMultiByte(1252, WC_NO_BEST_FIT_CHARS, PWideChar(S), Length(S), nil, 0, nil, @UsedDefaultChar);
  if Len <> 0 then
    Result := not UsedDefaultchar
  else
    Result := False;
end;

But if you want to check if your string can safely be converted to ansi - completely independent of the code page that is used when writing or reading, then you should check if all characters are in the range from #0..#127.

like image 36
Sebastian Z Avatar answered Oct 12 '22 23:10

Sebastian Z