Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the usage of TStringStream.DataString fail when TStringStream loads binary non-textual file?

  1. As the experts have been kindly suggested, TStringStream.DataString cannot be used to retrieve non-text data loaded by TStringStream.LoadFromFile, because TStringStream.GetDataString will call TEncoding's encoding methods, which, take TMBCSEncoding for example, will call TMBCSEncoding.GetChars which in turn calls TMBCSEncoding.UnicodeFromLocaleChars and finally Windows's MultiByteToWideChar.

  2. TBytes is recommended to be used as data buffer/binary storage. (For this purpose, TBytes is recommended over AnsiString.)

  3. The bytes can be retrieved from TStringStream.ReadBuffer method or TStringStream.Bytes property. Either way, TStream.Size should be considered.

====================================================

I am trying to use TStringStream and its DataString for base64-encoding/decoding purposes. It seems possible as indicated by Nils Haeck's reply here or here.

  1. Using TStringStream.DataString in TMainForm.QuestionOfString_StringStream (No.2 to No.7) fail in that the information is corrupted (i.e., not the same as the original information). However, ss_loaded_2.SaveToFile (No.1) saves the original information, indicating TStringStream does hold decoded non-textual data correctly internally? Could you help to comment about the possible reasons of DataString corruption?

  2. In Rob Kennedy's kind answer, he mentioned string or ansistring should be avoid in storing base64-decoded non-textual data, which makes great sense. However, as shown in TMainForm.QuestionOfString_NativeXML, the DecString of AnsiStringtype contains the decoded bytes so correctly that the data can be encoded back. Does this mean AnsiString can hold decoded non-texual data intact?

  3. David Heffernan and Rob Kennedy have kindly commented about bytes/TBytes. However, bytes extracted in TMainForm.QuestionOfString_NativeXML_Bytes_1 , is different from TStringStream's Bytes in TMainForm.QuestionOfString_NativeXML_Bytes_2. (From Base64-encoding/decoding results, the TStringStream.Bytes is wrong. It is confusing because based on the above paragraph, TStringStream should contain the intact bytes internally?) Could you help to comment about the possible reason?

Thank you very much for your help!

PS: The sample files can be download from SkyDrive: REF_EncodedSample & REF_DecodedSample. (Zlib-compressed image file.).

PS: Delphi XE, Windows 7. (It seems TStringStream back in Delphi 7 doesn't have LoadFromFile or SaveToFile.)

sample code

unit uMainForm;

interface

uses
  CodeSiteLogging,
  NativeXml, // v3.10
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs;

type
  TMainForm = class(TForm)
    procedure FormCreate(Sender: TObject);
  private
    { Private declarations }
    procedure QuestionOfString_StringStream;
    procedure QuestionOfString_NativeXML;
    procedure QuestionOfString_NativeXML_Bytes_1;
    procedure QuestionOfString_NativeXML_Bytes_2;
  public
    { Public declarations }
  end;

var
  MainForm: TMainForm;

implementation

{$R *.dfm}    

// http://stackoverflow.com/questions/773297/how-can-i-convert-tbytes-to-rawbytestring
function Convert(const Bytes: TBytes): RawByteString;
begin
  SetLength(Result, Length(Bytes));
  if Length(Bytes) > 0 then
  begin
    Move(Bytes[0], Result[1], Length(Bytes));
    // SetCodePage(Result, CP_ACP, False);
  end;
end;

procedure TMainForm.FormCreate(Sender: TObject);
begin
  QuestionOfString_StringStream;
  QuestionOfString_NativeXML;
  QuestionOfString_NativeXML_Bytes_1;
  QuestionOfString_NativeXML_Bytes_2;
end;

// http://www.delphigroups.info/2/3/321962.html
// http://borland.newsgroups.archived.at/public.delphi.graphics/200712/0712125679.html
procedure TMainForm.QuestionOfString_StringStream;
var
  ss_loaded_2, ss_loaded_3: TStringStream;
  dataStr: AnsiString;
  hexOfDataStr: AnsiString;
begin
  ss_loaded_2 := TStringStream.Create();
  // load the file containing Base64-decoded sample data
  ss_loaded_2.LoadFromFile('REF_DecodedSample');

  // 1  
  ss_loaded_2.SaveToFile('REF_DecodedSample_1_SavedByStringStream');

  // 2 
  ss_loaded_3 := TStringStream.Create(ss_loaded_2.DataString);
  ss_loaded_3.SaveToFile('REF_DecodedSample_2_SavedByStringStream');

  // 3     
  ss_loaded_3.Free;
  ss_loaded_3 := TStringStream.Create(ss_loaded_2.DataString, TEncoding.ASCII);
  ss_loaded_3.SaveToFile('REF_DecodedSample_3_SavedByStringStream');

  // 4     
  ss_loaded_3.Free;
  ss_loaded_3 := TStringStream.Create(ss_loaded_2.DataString, TEncoding.UTF8);
  ss_loaded_3.SaveToFile('REF_DecodedSample_4_SavedByStringStream');

  // 5     
  ss_loaded_3.Free;
  ss_loaded_3 := TStringStream.Create(AnsiString(ss_loaded_2.DataString));
  ss_loaded_3.SaveToFile('REF_DecodedSample_5_SavedByStringStream');

  // 6     
  ss_loaded_3.Free;
  ss_loaded_3 := TStringStream.Create(UTF8String(ss_loaded_2.DataString));
  ss_loaded_3.SaveToFile('REF_DecodedSample_6_SavedByStringStream');

  // 7 
  dataStr := ss_loaded_2.DataString;
  SetLength(hexOfDataStr, 2 * Length(dataStr));
  BinToHex(@dataStr[1], PAnsiChar(@hexOfDataStr[1]), Length(dataStr));
  CodeSite.Send(hexOfDataStr);

  ss_loaded_2.Free;
  ss_loaded_3.Free;
end;

// http://www.simdesign.nl/forum/viewtopic.php?f=2&t=1311
procedure TMainForm.QuestionOfString_NativeXML;
var
  LEnc, LDec: integer;
  EncStream: TMemoryStream;
  DecStream: TMemoryStream;
  EncString: AnsiString;
  DecString: AnsiString;
begin
  // encode and decode streams
  EncStream := TMemoryStream.Create;
  DecStream := TMemoryStream.Create;
  try
    // load BASE64-encoded data
    EncStream.LoadFromFile('REF_EncodedSample');
    LEnc := EncStream.Size;
    SetLength(EncString, LEnc);
    EncStream.Read(EncString[1], LEnc);

    // decode BASE64-encoded data, after removing control chars
    DecString := DecodeBase64(sdRemoveControlChars(EncString));
    LDec := length(DecString);
    DecStream.Write(DecString[1], LDec);

    // save the decoded data
    DecStream.SaveToFile('REF_DecodedSample_7_SavedByNativeXml');

    // EncString := sdAddControlChars(EncodeBase64(DecString), #$0D#$0A);
    EncString := EncodeBase64(DecString);

    // clear and resave encode stream as a copy
    EncStream.Clear;
    EncStream.Write(EncString[1], Length(EncString));
    EncStream.SaveToFile('REF_EncodedSampleCopy');

  finally
    EncStream.Free;
    DecStream.Free;
  end;
end;

procedure TMainForm.QuestionOfString_NativeXML_Bytes_1;
var
  LEnc, LDec: integer;
  EncStream: TMemoryStream;
  DecStream: TMemoryStream;
  EncString: AnsiString;
  DecString: AnsiString;
  DecBytes: TBytes;
begin
  // encode and decode streams
  EncStream := TMemoryStream.Create;
  DecStream := TMemoryStream.Create;
  try
    // load BASE64-decoded data
    DecStream.LoadFromFile('REF_DecodedSample');

    LDec := DecStream.Size;
    SetLength(DecBytes, LDec);
    DecStream.Read(DecBytes[0], LDec);

    EncString := EncodeBase64(Convert(DecBytes));

    // clear and resave encode stream as a copy
    EncStream.Write(EncString[1], Length(EncString));
    EncStream.SaveToFile('REF_EncodedSampleCopy_Bytes_1');

  finally
    EncStream.Free;
    DecStream.Free;
  end;
end;

procedure TMainForm.QuestionOfString_NativeXML_Bytes_2;
var
  LEnc, LDec: integer;
  EncStream: TMemoryStream;
  DecStream: TStringStream;
  EncString: AnsiString;
  DecString: AnsiString;
  DecBytes: TBytes;
begin
  // encode and decode streams
  EncStream := TMemoryStream.Create;
  DecStream := TStringStream.Create;
  try
    // load BASE64-decoded data
    DecStream.LoadFromFile('REF_DecodedSample');

    DecBytes := DecStream.Bytes;

    EncString := EncodeBase64(Convert(DecBytes));

    // clear and resave encode stream as a copy
    EncStream.Write(EncString[1], Length(EncString));
    EncStream.SaveToFile('REF_EncodedSampleCopy_Bytes_2');

  finally
    EncStream.Free;
    DecStream.Free;
  end;
end;

end.
like image 840
SOUser Avatar asked Dec 16 '22 22:12

SOUser


2 Answers

It's really no surprise that examples 3 through 7 fail. Your file is not textual data, so storing it in a text data structure is bound to show problems. Each of those tests involves converting the data from one encoding to another. Since your data isn't encoded as UTF-16 text to begin with, any conversion that expects the data to have that encoding is going to fail.

Example 2 probably fails because you have an odd number of bytes, and you're storing it in a string that by definition contains an even number of bytes. Somewhere, a byte is going to be introduced or dropped, causing different data to be stored.

Unless you're dealing with text, don't use TStringStream, string, or AnsiString. Try TBytesStream or TMemoryStream instead.

Feel free to store Base64-encoded data in a string. Base64 is a text format. But once you decode it, it's binary again, and has no business being in a text data structure anymore.

The reason you see different results now from what Nils Haeck suggested you should expect is that Haeck was writing in 2007, before Delphi strings became Unicode and the RTL did any automatic code-page conversions. You're using Delphi XE, where string is UnicodeString.

like image 179
Rob Kennedy Avatar answered Dec 28 '22 06:12

Rob Kennedy


You are not taking into account that TStringStream derives from TMemoryStream and TByteStream in D2009+ but derived directly from TStream in earlier versions. TMemoryStream allocates memory differently than your code is expecting, and the TByteStream.Bytes property represents the entire memory block that TMemoryStream allocates, but that does not mean that the entire contents of that memory is filled in with actual data. There is some extra padding involved that your code needs to ignore.

See my answer to your other question for a more detailed explanation as to why your code is failing.

like image 40
Remy Lebeau Avatar answered Dec 28 '22 06:12

Remy Lebeau