Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working with Unicode strings in Delphi 7

I need to write a program which will browse through strings of various lengths and select only those which are written using symbols from set defined by me (particularly Japanese letters). Strings will contain words written in different languages (German, French, Arabic, Russian, English etc). Obviously there is huge number of possible characters. I do not know which structure to use for that? I am using Delphi 7 right now. Can anybody suggest how to write such program?

like image 386
Tofig Hasanov Avatar asked Feb 17 '10 14:02

Tofig Hasanov


People also ask

What is Unicode Delphi?

The delphi string is an implementation of UTF-16 in which the "normal" graphic characters corresponding to 1 Char (2 bytes), but having others graphic characters being represented by 2 Chars (4 bytes), the so-called surrogate pair. There is no such thing as a "normal graphic character" in Unicode.

Is Unicode the same as string?

Unicode is a standard encoding system that is used to represent characters from almost all languages. Every Unicode character is encoded using a unique integer code point between 0 and 0x10FFFF . A Unicode string is a sequence of zero or more code points.

How many characters can a string hold Delphi?

The Delphi language supports short-string types - in effect, subtypes of ShortString - whose maximum length is anywhere from 0 to 255 characters.

What is string in Delphi?

A string represents a sequence of characters. Delphi supports the following predefined string types. String types. Type. Maximum length.


2 Answers

Obviously you would be better off with Delphi 2010, since the VCL in delphi 7 is not aware of Unicode strings. You can use WideString types, and WideChar types in Delphi 7, and you can install a component set like the TNT Unicode Components to help you create a user interface that can display your results.

For a very-large-set type, consider using a bit array like TBits. A bit array of length 65536 would hold enough to contain every UTF-16 code-point. Checking if Char X is in Set Y, would be basically:

function WideCharsInSet( wcstr:WideString; wcset:TBits):Boolean;
var
 n:Integer;
 wc:WideChar;
begin
result := false;
for n := 1 to Length(wcstr) do begin
  wc := wcstr[n];
  if wcset[Ord(wc)] then
      result := true;
end;
end;

procedure Demo;
var
 wcset1:TBits;
 s:WideString;
begin
 wcset1 := TBits.Create;
 try
  // 1157 - Hangul Korean codepoint I found with Char Map
    wcset1[1157] := true;         
    // go get a string value s:
    s := WideChar(1157);
// return true if at least one element in set wcset is found in string s:
    if WideCharsInSet(s,wcset1) then begin
        Application.MessageBox('Found it','found it',MB_OK);
    end;

 finally
  wcset1.Free;
 end;

end;
like image 79
Warren P Avatar answered Nov 02 '22 19:11

Warren P


For the simple processing of strings in the manner you describe, do not be put off by suggestions that you should upgrade to the latest compiler and Unicode enabled framework. The Unicode support itself is of course provided by the underlying Windows API which is of course (directly) accessible from "non-Unicode" versions of Delphi just as much as from "Unicode versions".

I suspect that most if not all of the Unicode support that you need for the purposes outlined in your question can be obtained from the Unicode support provided in the JEDI JCL.

For any visual component support you may require the TNT control set has the appeal of being free.

like image 23
Deltics Avatar answered Nov 02 '22 19:11

Deltics