I hope this question is appropriate for stackoverflow... What is the difference between storing raw data bytes (8 bits) in a std::string
rather than storing them in std::vector<char>
. I'm reading binary data from a file and storing those raw bytes in a std::string
. This works well, there are no problems or issues with doing this. My program works as expected. However, other programmers prefer the std::vector<char>
approach and suggest I stop using std::string
as it's unsafe for raw bytes. So I'm wondering why might it be unsafe to use std::string
to hold raw data bytes? I know std::string
is most often used to store ASCII text, but a byte is a byte, so I don't understand the preference of the std::vector<char>
.
Thanks for any advice!
The problem is not really whether it works or it doesn't. The problem is that it is utterly confusing for the next guy reading your code. std::string
is meant for displaying text. Anybody reading your code will expect that. You'll declare your intent much better with a std::vector<char>
.
It increases your WTF/min in code reviews.
In C++03, using std::string
to store an array of byte data was not a good idea. By the standard, std::string
did not have to store data contiguously. C++11 fixed that so that it's data does have to be contiguous.
So it would not be functional to do this in C++03. Not unless you have personally vetted your C++ standard library implementation of std::string
to ensure that it is contiguous.
Either way, I would suggest vector<char>
. Generally, when you see string
, you expect it to be a... string. You know, a sequence of characters in some form of encoding. A vector<char>
makes it obvious that it isn't a string, but an array of bytes.
Besides contiguous storage and code-clarity issues, I ran into some fairly insidious errors trying to use std::string
to hold raw bytes.
Most of them centered around trying to convert a char
array of bytes to std::string
when interfacing with C libraries. For example:
std::string password = "pass\0word";
std::cout << password.length() << std::endl; // prints 4, not 9
Maybe you can fix that by specifying the length:
std::string password("pass\0word", 0, 9);
std::cout << password.length() << std::endl; // nope! still 4!
This is probably because the constructor expects to receive a C-string, not a byte array. There might be a better way, but I ended up with this:
std::string password("pass0word", 0, 9);
password[4] = '\0';
std::cout << password.length() << std::endl; // hurray! 9!
A little clunky. Thankfully I found this in unit testing, but I would have missed it if my test vectors didn't have null bytes. What makes this insidious is that the second approach above will work fine until the array contains a null byte.
So far std::vector<uint8_t>
looks like a good option (thanks J.N. and Hurkyl):
char p[] = "pass\0word";
std::vector<uint8_t> password(p, p, p+9); // :)
Note: I haven't tried the iterator constructor with std::string
, but this error is easy enough to make that it might be worth avoiding even the possibility.
Lessons learned:
std::string
to hold raw bytes.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With