Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode string and TStringStream

Delphi 2009 and above uses unicode strings for their default string type. To my understanding unicode char is actually 16 bit value or 2 bytes (note: I understand there is possibility of 3 or 4 bytes char, but let's consider the most usual case). However I found that TStringStream is not very reliable to manipulating this strings. For example, TStringStream.Size property returns the length of the string, while I think it should return the byte count of the contained string. Okay, you can adjust it on your own, but the thing that really confused me the most is: TStringStream does not read from or write to a buffer reliably.

Please check the following code (it's a DUnit test and always fail). Please let me know where the problem is (I was using D2010 when testing the code).

procedure TestTCPackage.TestStringStream;
const
  cCount = 10;
  cOrdMaxChar = Ord(High(Char));
var
  B: Pointer;
  SW, SR: TStringStream;
  T: string;
  i, j, k : Integer;
  vStrings: array [0..cCount-1] of string;
begin
  RandSeed := GetTickCount;
  for i := 0 to cCount - 1 do
  begin
    j := Random(100) + 1;
    SetLength(vStrings[i], j);
    for k := 1 to j do
      // fill string with random char (but no #0)
      vStrings[i][k] := Char(Random(cOrdMaxChar-1) + 1);
  end;

  for i := 0 to cCount - 1 do
  begin
    SW := TStringStream.Create(vStrings[i]);
    try
      GetMem(B, SW.Size * SizeOf(Char));
      try
        SW.Read(B^, SW.Size * SizeOf(Char));

        SR := TStringStream.Create;
        try
          SR.Write(B^, SW.Size * SizeOf(Char));
          SR.Position := 0;

          // check the string in the TStringStream with original value
          Check(SR.DataString = vStrings[i]);
        finally
          SR.Free;
        end;
      finally
        FreeMem(B);
      end;
    finally
      SW.Free;
    end;
  end;
end;

Note: I already tried to use an instance of TMemoryStream as intermediary from reading/writing the buffer and use CopyFrom of the TStringStream to read the content of that TMemoryStream with same failing effect.

like image 748
Luthfi Avatar asked Oct 11 '10 13:10

Luthfi


2 Answers

Unicode strings aren't for data storage; use TBytes for that. TStringStream uses its associated encoding (the Encoding property) for encoding strings passed in with WriteString, and decoding strings read out with ReadString or the DataString property.

like image 171
Barry Kelly Avatar answered Nov 15 '22 03:11

Barry Kelly


After reading this post (and thanks to Serg who provided the answer to that question) and Barry Kelly's answer, I have found the problem. TStringStream is actually using ASCII/ansistring encoding by default. So even if your default string type is unicode, unless you spesifically tell it to, it won't use unicode encoding. Personally I think it's strange. Maybe for making it easier to convert old codes.

So you have to specifically set the encoding of the TStringStream to TEncoding.Unicode to manipulate unicode string properly.

Here is my modified code which passes DUnit test is:

procedure TestTCPackage.TestStringStream;
const
  cCount = 10;
  cOrdMaxChar = Ord(High(Char));
var
  B: Pointer;
  SW, SR: TStringStream;
  i, j, k : Integer;
  vStrings: array [0..cCount-1] of string;
begin
  RandSeed := GetTickCount;
  for i := 0 to cCount - 1 do
  begin
    j := Random(100) + 1;
    SetLength(vStrings[i], j);
    for k := 1 to j do
      // fill string with random char (but no #0)
      vStrings[i][k] := Char(Random(cOrdMaxChar-1) + 1);
  end;

  for i := 0 to cCount - 1 do
  begin
    SW := TStringStream.Create(vStrings[i], ***TEncoding.Unicode***);
    try
      GetMem(B, SW.Size);
      try
        SW.ReadBuffer(B^, SW.Size);

        SR := TStringStream.Create('', ***TEncoding.Unicode***);
        try
          SR.WriteBuffer(B^, SW.Size);
          SR.Position := 0;

          // check the string in the TStringStream with original value
          Check(SR.DataString = vStrings[i]);
        finally
          SR.Free;
        end;
      finally
        FreeMem(B);
      end;
    finally
      SW.Free;
    end;
  end;
end;

Last note: Unicode does bite! :D

like image 5
Luthfi Avatar answered Nov 15 '22 04:11

Luthfi