Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does a strings length equal the byte size?

Tags:

string

byte

Exactly that: Does a strings length equal the byte size? Does it matter on the language?

I think it is, but I just want to make sure.

Additional Info: I'm just wondering in general. My specific situation was PHP with MySQL.

As the answer is no, that's all I need know.

like image 449
Darryl Hein Avatar asked Jan 03 '09 20:01

Darryl Hein


4 Answers

Nope. A zero terminated string has one extra byte. A pascal string (the Delphi shortstring) has an extra byte for the length. And unicode strings has more than one byte per character.

By unicode it depends on the encoding. It could be 2 or 4 bytes per character or even a mix of 1,2 and 4 bytes.

like image 106
Toon Krijthe Avatar answered Sep 25 '22 10:09

Toon Krijthe


It entirely depends on the platform and representation.

For example, in .NET a string takes two bytes in memory per UTF-16 code point. However, surrogate pairs require two UTF-16 values for a full Unicode character in the range U+100000 to U+10FFFF. The in-memory form also has an overhead for the length of the string and possibly some padding, as well as the normal object overhead of a type pointer etc.

Now, when you write a string out to disk (or the network, etc) from .NET, you specify the encoding (with most classes defaulting to UTF-8). At that point, the size depends very much on the encoding. ASCII always takes a single byte per character, but is very limited (no accents etc); UTF-8 gives the full Unicode range with a variable encoding (all ASCII characters are represented in a single byte, but others take up more). UTF-32 always uses exactly 4 bytes for any Unicode character - the list goes on.

As you can see, it's not a simple topic. To work out how much space a string is going to take up you'll need to specify exactly what the situation is - whether it's an object in memory on some platform (and if so, which platform - potentially even down to the implementation and operating system settings), or whether it's a raw encoded form such as a text file, and if so using which encoding.

like image 42
Jon Skeet Avatar answered Sep 22 '22 10:09

Jon Skeet


It depends on what you mean by "length". If you mean "number of characters" then, no, many languages/encoding methods use more than one byte per character.

like image 30
Steven Robbins Avatar answered Sep 22 '22 10:09

Steven Robbins


Not always, it depends on the encoding.

like image 20
Malfist Avatar answered Sep 23 '22 10:09

Malfist