Suppose that for some perverse reason you want to display the raw byte contents of a UTF8String.
var
utf8Str : UTF8String;
begin
utf8Str := '€ąćęłńóśźż';
end;
(1) This doesn't do, it displays the readable form:
memo1.Lines.Add( RawByteString( utf8Str ));
// output: '€ąćęłńóśźż'
(2) This, however, does "work" - note the concatenation:
memo1.Lines.Add( 'x' + RawByteString( utf8Str ));
// output: 'x€ąćęłńóśźż'
I understand (1), though the compiler's forced coerction to UnicodeString seems to prevent ever displaying a RawByteString var as-is. However, why does the behavior change in (2)?
(3) Stranger still - let's reverse the concatenation:
memo1.Lines.Add( RawByteString( utf8Str ) + 'x' );
// output: '€ąćęłńóśźżx'
I've been reading up on the newfangled string types in Delphi and thought I understood how they work, but this is a puzzle.
RawByteString
only exists to minimize the number of overloads required for functions that work with various flavours of AnsiString
s with different codepage affinities.
In general, don't declare variables of type RawByteString
. Don't typecast values to that type. Don't do concatenations on variables of that type. About the only things you can do are:
StringCodePage
function.For example, you'll note that the StringCodePage
function itself uses RawByteString
as its argument type. This way, it will work with any AnsiString
, rather than doing a codepage translation before passing it as an argument.
For your case, things like concatenations are largely undefined. The behaviour changed between RTM and Update 2, but when the RTL string concatenation functions receive multiple strings with different code pages, there's no easy way for it to figure out what code page should be used for the final string. That's just one reason why you shouldn't concatenate them like you do here.
You cannot add a string to a TMemo "as is". You always need to so some kind of conversion to Unicode, because that's all TMemo knows about in Delphi 2009.
If you want to pretend that your UTF8String uses code page 1252, do this:
var
utf8Str : UTF8String;
Raw: RawByteString;
begin
utf8Str := '€ąćęłńóśźż';
Raw := utf8Str;
SetCodePage(Raw, 1252, False);
Memo.Lines.Add(Raw);
end;
For more details, see my article Using RawByteString Effectively
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With