I am writing a class that will save wide strings to a binary file. I'm using Delphi 2005 for this but the app will later be ported to Delphi 2010. I'm feeling very unsure here, can someone confirm that:
A Delphi 2005 WideString
is exactly the same type as a Delphi 2010 String
A Delphi 2005 WideString
char as well as a Delphi 2010 String
char is guaranteed to always be 2 bytes in size.
With all the Unicode formats out there I don't want to be hit with one of the chars in my string suddenly being 3 bytes wide or something like that.
Edit: Found this: "I indeed said UnicodeString, not WideString. WideString still exists, and is unchanged. WideString is allocated by the Windows memory manager, and should be used for interacting with COM objects. WideString maps directly to the BSTR type in COM." at http://www.micro-isv.asia/2008/08/get-ready-for-delphi-2009-and-unicode/
Now I'm even more confused. So a Delphi 2010 WideString
is not the same as a Delphi 2005 WideString
? Should I use UnicodeString
instead?
Edit 2: There's no UnicodeString
type in Delphi 2005. FML.
Description. WideString is similar to AnsiString , but instead of storing a string of AnsiChar characters, it stores a Unicode string of WideChar (16-bit) characters. WideString keeps track of its length and automatically appends a #0 character to the end of the string so you can easily cast it to PWideChar .
The AnsiString data type is used to hold sequences of characters, like sentences. Each character is an AnsiChar, guaranteed to be 8 bits in size. An AnsiString can hold any number of characters, restricted only by memory. Unlike ShortStrings, AnsiStrings are pointer referenced variables.
The "#13#10" part represents a carriage return + line feed combination. The "#13" is the ASCII equivalent of the CR (carriage return) value; #10 represents LF (line feed). Two more interesting control characters include: #0 — NULL character.
var val : String; begin val:= 'example'; ShowMessage(IntToStr(Length(val) * SizeOf(Char))); end; Or use ByteLength to obtain the size of a string in bytes. ByteLength calculates the size of the string by multiplying the number of characters in that string to the size of a character.
For your first question: WideString
is not exactly the same type as D2010's string. WideString is the same COM BSTR type that it has always been. It's managed by Windows, with no reference counting, so it makes a copy of the whole BSTR every time you pass it somewhere.
UnicodeString
, which is the default string type in D2009 and on, is basically a UTF-16 version of the AnsiString
we all know and love. It's got a reference count and is managed by the Delphi compiler.
For the second, the default char
type is now WideChar
, which are the same chars that have always been used in WideString
. It's a UTF-16 encoding, 2 bytes per char. If you save WideString data to a file, you can load it into a UnicodeString
without trouble. The difference between the two types has to do with memory management, not the data format.
As others mentioned, string (actually UnicodeString) data type in Delphi 2009 and above is not equivalent to WideString data type in previous versions, but the data content format is the same. Both of them save the string in UTF-16. So if you save a text using WideString in earlier versions of Delphi, you should be able to read it correctly using string data type in the recent versions of Delphi (2009 and above).
You should take note that performance of UnicodeString is way superior than WideString. So if you are going to use the same source code in both Delphi 2005 and Delphi 2010, I suggest you use a string type alias with conditional compiling in your code, so that you can have the best of both worlds:
type
{$IFDEF Unicode}
MyStringType = UnicodeString;
{$ELSE}
MyStringType = WideString;
{$ENDIF}
Now you can use MyStringType as your string type in your source code. If the compiler is Unicode (Delphi 2009 and above), then your string type would be an alias of UnicodeString type which is introduced in Delphi 2009 to hold Unicode strings. If the compiler is not unicode (e.g. Delphi 2005) then your string type would be an alias for the old WideString data type. And since they both are UTF-16, data saved by any of the versions should be read by the other one correctly.
That is not true - ex Delphi 2010 string has hidden internal codepage field - but probably it does not matter for you.
That is true. In Delphi 2010 SizeOf(Char) = 2 (Char = WideChar).
There cannot be different codepage for unicode strings - codepage field was introduced to create a common binary format for both Ansi strings (that need codepage field) and Unicode string (that don't need it).
If you save WideString data to stream in Delphi 2005 and load the same data to string in Delphi 2010 all should work OK.
WideString = BSTR and that is not changed between Delphi 2005 and 2010
UnicodeString = WideString in Delphi 2005 (if UnicodeString type exists in Delphi 2005 - I don't know) UnicodeString = string in Delphi 2009 and above.
@Marco - Ansi and Unicode strings in Delphi 2009+ have common binary format (12-byte header).
UnicodeString codepage CP_UTF16 = 1200;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With