Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Length of a C++ std::string in bytes

Tags:

I'm having some trouble figuring out the exact semantics of std::string.length(). The documentation explicitly points out that length() returns the number of characters in the string and not the number of bytes. I was wondering in which cases this actually makes a difference.

In particular, is this only relevant to non-char instantiations of std::basic_string<> or can I also get into trouble when storing UTF-8 strings with multi-byte characters? Does the standard allow for length() to be UTF8-aware?

like image 931
ComicSansMS Avatar asked Oct 12 '11 16:10

ComicSansMS


People also ask

How many bytes is a string in C?

A string is composed of: An 8-byte object header (4-byte SyncBlock and a 4-byte type descriptor)

What is size of string in C++?

In C++, string length really represents the number of bytes used to encode the given string. Since one byte in C++ usually maps to one character, this metric mostly means “number of characters,” too.

How do you find the length of a string in C?

Use the strlen() function provided by the C standard library string. h header file. char name[7] = "Flavio"; strlen(name); This function will return the length of a string as an integer value.

How do I find the size of the std::string?

std::string::size Returns the length of the string, in terms of bytes. This is the number of actual bytes that conform the contents of the string, which is not necessarily equal to its storage capacity.


3 Answers

When dealing with non-char instantiations of std::basic_string<>, sure, length may not equal number of bytes. This is particularly evident with std::wstring:

std::wstring ws = L"hi";
cout << ws.length();     // <-- 2, not 4

But std::string is about char characters; there is no such thing as a multi-byte character as far as std::string is concerned, whether you crammed one in at a high level or not. So, std::string.length() is always the number of bytes represented by the string. Note that if you're cramming multibyte "characters" into an std::string, then your definition of "character" suddenly becomes at odds with that of the container and of the standard.

like image 145
Lightness Races in Orbit Avatar answered Sep 20 '22 14:09

Lightness Races in Orbit


If we are talking specifically about std::string, then length() does return the number of bytes.

This is because a std::string is a basic_string of chars, and the C++ Standard defines the size of one char to be exactly one byte.

Note that the Standard doesn't say how many bits are in a byte, but that's another story entirely and you probably don't care.

EDIT: The Standard does say that an implementation shall provide a definition for CHAR_BIT which says how many bits are in a byte.

By the way, if you go down a road where you do care how many bits are in a byte, you might consider reading this.

like image 12
John Dibling Avatar answered Jan 01 '70 00:01

John Dibling


A std::string is std::basic_string<char>, so s.length() * sizeof(char) = byte length. Also, std::string knows nothing of UTF-8, so you're going to get the byte size even if that's not really what you're after.

If you have UTF-8 data in a std::string, you'll need to use something else such as ICU to get the "real" length.

like image 4
NuSkooler Avatar answered Jan 01 '70 00:01

NuSkooler