I had a similar question to this here: Delphi XE - should I use String or AnsiString? . After deciding that it is right to use ANSI strings in a (large) library of mine, I have realized that I can actually use RawByteString instead of ANSI. Because I mix UNICODE strings with ANSI strings, my code now has quite few places where it does conversions between them. However, it looks like if I use RawByteString I get rid of those conversions.
Please let me know your opinion about it.
Thanks.
Update:
This seems to be disappointing. It looks like the compiler still makes a conversion from RawByteString to string.
procedure TForm1.FormCreate(Sender: TObject);
var x1, x2: RawByteString;
s: string;
begin
x1:= 'a';
x2:= 'b';
x1:= x1+ x2;
s:= x1; { <------- Implicit string cast from 'RawByteString' to 'string' }
end;
I think it does some internal workings (such as copying data) and my code will not be much faster and I will still have to add lots of typecasts in my code in order to silence the compiler.
RawByteString
is an AnsiString
with no code page set by default.
When you assign another string
to this RawByteString
variable, you'll copy the code page of the source string
. And this will include a conversion. Sorry.
But there is one another use of RawByteString
, which is to store plain byte content (e.g. a database BLOB field content, just like an array of byte
)
To summarize:
RawByteString
should be used as a "code page agnostic" parameter to a method or function;RawByteString
can be used as a variable type to store some BLOB data.If you want to reduce conversion, and would rather use 8 bit char string
in your application, you should better:
AnsiString
type, which will depend on the current system code page, and by which you'll loose data; UnicodeString
;That exactly what we made for our framework. We wanted to use UTF-8 in its kernel because:
WideString
was not an option because it's dead slow and you've got the same problem of implicit conversions.But, in order to achieve best speed, we write some optimized functions to handle our custom string type:
{{ RawUTF8 is an UTF-8 String stored in an AnsiString
- use this type instead of System.UTF8String, which behavior changed
between Delphi 2009 compiler and previous versions: our implementation
is consistent and compatible with all versions of Delphi compiler
- mimic Delphi 2009 UTF8String, without the charset conversion overhead
- all conversion to/from AnsiString or RawUnicode must be explicit }
{$ifdef UNICODE} RawUTF8 = type AnsiString(CP_UTF8); // Codepage for an UTF8string
{$else} RawUTF8 = type AnsiString; {$endif}
/// our fast RawUTF8 version of Trim(), for Unicode only compiler
// - this Trim() is seldom used, but this RawUTF8 specific version is needed
// by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString
function Trim(const S: RawUTF8): RawUTF8;
/// our fast RawUTF8 version of Pos(), for Unicode only compiler
// - this Pos() is seldom used, but this RawUTF8 specific version is needed
// by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString
function Pos(const substr, str: RawUTF8): Integer; overload; inline;
And we reserved the RawByteString
type for handling BLOB data:
{$ifndef UNICODE}
/// define RawByteString, as it does exist in Delphi 2009/2010/XE
// - to be used for byte storage into an AnsiString
// - use this type if you don't want the Delphi compiler not to do any
// code page conversions when you assign a typed AnsiString to a RawByteString,
// i.e. a RawUTF8 or a WinAnsiString
RawByteString = AnsiString;
/// pointer to a RawByteString
PRawByteString = ^RawByteString;
{$endif}
/// create a File from a string content
// - uses RawByteString for byte storage, thatever the codepage is
function FileFromString(const Content: RawByteString; const FileName: TFileName;
FlushOnDisk: boolean=false): boolean;
Source code is available in our repository. In this unit, UTF-8 related functions were deeply optimized, with both version in pascal and asm for better speed. We sometimes overloaded default functions (like Pos
) to avoid conversion, or More information about how we handled text in the framework is available here.
Last word:
If you are sure that you will only have 7 bit content in your application (no accentuated characters), you may use the default AnsiString
type in your program. But in this case, you should better add the AnsiStrings
unit in your uses
clause to have overloaded string functions which will avoid most unwanted conversion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With