Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delphi 2009 + Unicode + Char-size

I just got Delphi 2009 and have previously read some articles about modifications that might be necessary because of the switch to Unicode strings. Mostly, it is mentioned that sizeof(char) is not guaranteed to be 1 anymore. But why would this be interesting regarding string manipulation?

For example, if I use an AnsiString:='Test' and do the same with a String (which is unicode now), then I get Length() = 4 which is correct for both cases. Without having tested it, I'm sure all other string manipulation functions behave the same way and decide internally if the argument is a unicode string or anything else.

Why would the actual size of a char be of interest for me if I do string manipulations? (Of course if I use strings as strings and not to store any other data)

Thanks for any help! Holger

like image 769
Holgerwa Avatar asked Sep 24 '08 08:09

Holgerwa


2 Answers

With Unicode SizeOf(SomeChar) <> Length(SomeChar). Essentially the length of a string is less then the sum of the size of its chars. As long as you don't assume SizeOf(Char) = 1, or SizeOf(SomeString[x]) = 1 (since both are FALSE now) or try to interchange bytes with chars, then you shouldn't have any trouble. Any place you are doing something creative stuffing Bytes into Chars or Strings, then you will need to use AnsiString.

(SizeOf(SomeString) is still 4 no matter the length since it is essentially a pointer with some compiler magic.)

like image 94
Jim McKeeth Avatar answered Sep 20 '22 23:09

Jim McKeeth


People often implicitly convert from characters to bytes in old Delphi code without really thinking about it. For example, when writing to a stream. When you write a string to a stream, you have to specify the number of bytes you write, but people often pass the character count instead. See this post from Chris Bensen for another example.

Another way people often make this implicit conversion and older code is by using a "string" to store binary data. In this case, they actually want bytes, but the data type expects characters. D2009 has a better type for this.

like image 42
Craig Stuntz Avatar answered Sep 21 '22 23:09

Craig Stuntz