Delphi 2009 and above uses unicode strings for their default string type. To my understanding unicode char is actually 16 bit value or 2 bytes (note: I understand there is possibility of 3 or 4 bytes char, but let's consider the most usual case). However I found that TStringStream is not very reliable to manipulating this strings. For example, TStringStream.Size property returns the length of the string, while I think it should return the byte count of the contained string. Okay, you can adjust it on your own, but the thing that really confused me the most is: TStringStream does not read from or write to a buffer reliably.
Please check the following code (it's a DUnit test and always fail). Please let me know where the problem is (I was using D2010 when testing the code).
procedure TestTCPackage.TestStringStream;
const
cCount = 10;
cOrdMaxChar = Ord(High(Char));
var
B: Pointer;
SW, SR: TStringStream;
T: string;
i, j, k : Integer;
vStrings: array [0..cCount-1] of string;
begin
RandSeed := GetTickCount;
for i := 0 to cCount - 1 do
begin
j := Random(100) + 1;
SetLength(vStrings[i], j);
for k := 1 to j do
// fill string with random char (but no #0)
vStrings[i][k] := Char(Random(cOrdMaxChar-1) + 1);
end;
for i := 0 to cCount - 1 do
begin
SW := TStringStream.Create(vStrings[i]);
try
GetMem(B, SW.Size * SizeOf(Char));
try
SW.Read(B^, SW.Size * SizeOf(Char));
SR := TStringStream.Create;
try
SR.Write(B^, SW.Size * SizeOf(Char));
SR.Position := 0;
// check the string in the TStringStream with original value
Check(SR.DataString = vStrings[i]);
finally
SR.Free;
end;
finally
FreeMem(B);
end;
finally
SW.Free;
end;
end;
end;
Note: I already tried to use an instance of TMemoryStream as intermediary from reading/writing the buffer and use CopyFrom of the TStringStream to read the content of that TMemoryStream with same failing effect.
Unicode strings aren't for data storage; use TBytes
for that. TStringStream
uses its associated encoding (the Encoding
property) for encoding strings passed in with WriteString
, and decoding strings read out with ReadString
or the DataString
property.
After reading this post (and thanks to Serg who provided the answer to that question) and Barry Kelly's answer, I have found the problem. TStringStream is actually using ASCII/ansistring encoding by default. So even if your default string type is unicode, unless you spesifically tell it to, it won't use unicode encoding. Personally I think it's strange. Maybe for making it easier to convert old codes.
So you have to specifically set the encoding of the TStringStream to TEncoding.Unicode to manipulate unicode string properly.
Here is my modified code which passes DUnit test is:
procedure TestTCPackage.TestStringStream;
const
cCount = 10;
cOrdMaxChar = Ord(High(Char));
var
B: Pointer;
SW, SR: TStringStream;
i, j, k : Integer;
vStrings: array [0..cCount-1] of string;
begin
RandSeed := GetTickCount;
for i := 0 to cCount - 1 do
begin
j := Random(100) + 1;
SetLength(vStrings[i], j);
for k := 1 to j do
// fill string with random char (but no #0)
vStrings[i][k] := Char(Random(cOrdMaxChar-1) + 1);
end;
for i := 0 to cCount - 1 do
begin
SW := TStringStream.Create(vStrings[i], ***TEncoding.Unicode***);
try
GetMem(B, SW.Size);
try
SW.ReadBuffer(B^, SW.Size);
SR := TStringStream.Create('', ***TEncoding.Unicode***);
try
SR.WriteBuffer(B^, SW.Size);
SR.Position := 0;
// check the string in the TStringStream with original value
Check(SR.DataString = vStrings[i]);
finally
SR.Free;
end;
finally
FreeMem(B);
end;
finally
SW.Free;
end;
end;
end;
Last note: Unicode does bite! :D
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With