Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delphi WideString and Delphi 2009+

I am writing a class that will save wide strings to a binary file. I'm using Delphi 2005 for this but the app will later be ported to Delphi 2010. I'm feeling very unsure here, can someone confirm that:

  1. A Delphi 2005 WideString is exactly the same type as a Delphi 2010 String

  2. A Delphi 2005 WideString char as well as a Delphi 2010 String char is guaranteed to always be 2 bytes in size.

With all the Unicode formats out there I don't want to be hit with one of the chars in my string suddenly being 3 bytes wide or something like that.

Edit: Found this: "I indeed said UnicodeString, not WideString. WideString still exists, and is unchanged. WideString is allocated by the Windows memory manager, and should be used for interacting with COM objects. WideString maps directly to the BSTR type in COM." at http://www.micro-isv.asia/2008/08/get-ready-for-delphi-2009-and-unicode/

Now I'm even more confused. So a Delphi 2010 WideString is not the same as a Delphi 2005 WideString? Should I use UnicodeString instead?

Edit 2: There's no UnicodeString type in Delphi 2005. FML.

like image 844
David Avatar asked Nov 04 '10 12:11

David


People also ask

What is WideString in Delphi?

Description. WideString is similar to AnsiString , but instead of storing a string of AnsiChar characters, it stores a Unicode string of WideChar (16-bit) characters. WideString keeps track of its length and automatically appends a #0 character to the end of the string so you can easily cast it to PWideChar .

What is AnsiString in Delphi?

The AnsiString data type is used to hold sequences of characters, like sentences. Each character is an AnsiChar, guaranteed to be 8 bits in size. An AnsiString can hold any number of characters, restricted only by memory. Unlike ShortStrings, AnsiStrings are pointer referenced variables.

What does #10 do in Delphi?

The "#13#10" part represents a carriage return + line feed combination. The "#13" is the ASCII equivalent of the CR (carriage return) value; #10 represents LF (line feed). Two more interesting control characters include: #0 — NULL character.

How do you find the length of a string in Delphi?

var val : String; begin val:= 'example'; ShowMessage(IntToStr(Length(val) * SizeOf(Char))); end; Or use ByteLength to obtain the size of a string in bytes. ByteLength calculates the size of the string by multiplying the number of characters in that string to the size of a character.


3 Answers

For your first question: WideString is not exactly the same type as D2010's string. WideString is the same COM BSTR type that it has always been. It's managed by Windows, with no reference counting, so it makes a copy of the whole BSTR every time you pass it somewhere.

UnicodeString, which is the default string type in D2009 and on, is basically a UTF-16 version of the AnsiString we all know and love. It's got a reference count and is managed by the Delphi compiler.

For the second, the default char type is now WideChar, which are the same chars that have always been used in WideString. It's a UTF-16 encoding, 2 bytes per char. If you save WideString data to a file, you can load it into a UnicodeString without trouble. The difference between the two types has to do with memory management, not the data format.

like image 139
Mason Wheeler Avatar answered Sep 30 '22 16:09

Mason Wheeler


As others mentioned, string (actually UnicodeString) data type in Delphi 2009 and above is not equivalent to WideString data type in previous versions, but the data content format is the same. Both of them save the string in UTF-16. So if you save a text using WideString in earlier versions of Delphi, you should be able to read it correctly using string data type in the recent versions of Delphi (2009 and above).

You should take note that performance of UnicodeString is way superior than WideString. So if you are going to use the same source code in both Delphi 2005 and Delphi 2010, I suggest you use a string type alias with conditional compiling in your code, so that you can have the best of both worlds:

type
  {$IFDEF Unicode}
  MyStringType = UnicodeString;
  {$ELSE}
  MyStringType = WideString;
  {$ENDIF}

Now you can use MyStringType as your string type in your source code. If the compiler is Unicode (Delphi 2009 and above), then your string type would be an alias of UnicodeString type which is introduced in Delphi 2009 to hold Unicode strings. If the compiler is not unicode (e.g. Delphi 2005) then your string type would be an alias for the old WideString data type. And since they both are UTF-16, data saved by any of the versions should be read by the other one correctly.

like image 34
vcldeveloper Avatar answered Sep 30 '22 14:09

vcldeveloper


  1. A Delphi 2005 WideString is exactly the same type as a Delphi 2010 String

That is not true - ex Delphi 2010 string has hidden internal codepage field - but probably it does not matter for you.

  1. A Delphi 2005 WideString char as well as a Delphi 2010 String char is guaranteed to always be 2 bytes in size.

That is true. In Delphi 2010 SizeOf(Char) = 2 (Char = WideChar).


There cannot be different codepage for unicode strings - codepage field was introduced to create a common binary format for both Ansi strings (that need codepage field) and Unicode string (that don't need it).

If you save WideString data to stream in Delphi 2005 and load the same data to string in Delphi 2010 all should work OK.

WideString = BSTR and that is not changed between Delphi 2005 and 2010

UnicodeString = WideString in Delphi 2005 (if UnicodeString type exists in Delphi 2005 - I don't know) UnicodeString = string in Delphi 2009 and above.


@Marco - Ansi and Unicode strings in Delphi 2009+ have common binary format (12-byte header).

UnicodeString codepage CP_UTF16 = 1200;

like image 20
kludg Avatar answered Sep 30 '22 14:09

kludg