Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::string or std::vector<char> to hold raw data

Tags:

c++

string

I hope this question is appropriate for stackoverflow... What is the difference between storing raw data bytes (8 bits) in a std::string rather than storing them in std::vector<char>. I'm reading binary data from a file and storing those raw bytes in a std::string. This works well, there are no problems or issues with doing this. My program works as expected. However, other programmers prefer the std::vector<char> approach and suggest I stop using std::string as it's unsafe for raw bytes. So I'm wondering why might it be unsafe to use std::string to hold raw data bytes? I know std::string is most often used to store ASCII text, but a byte is a byte, so I don't understand the preference of the std::vector<char>.

Thanks for any advice!

like image 845
01100110 Avatar asked Mar 08 '12 23:03

01100110


3 Answers

The problem is not really whether it works or it doesn't. The problem is that it is utterly confusing for the next guy reading your code. std::string is meant for displaying text. Anybody reading your code will expect that. You'll declare your intent much better with a std::vector<char>.

It increases your WTF/min in code reviews.

like image 86
J.N. Avatar answered Oct 21 '22 01:10

J.N.


In C++03, using std::string to store an array of byte data was not a good idea. By the standard, std::string did not have to store data contiguously. C++11 fixed that so that it's data does have to be contiguous.

So it would not be functional to do this in C++03. Not unless you have personally vetted your C++ standard library implementation of std::string to ensure that it is contiguous.

Either way, I would suggest vector<char>. Generally, when you see string, you expect it to be a... string. You know, a sequence of characters in some form of encoding. A vector<char> makes it obvious that it isn't a string, but an array of bytes.

like image 21
Nicol Bolas Avatar answered Oct 21 '22 03:10

Nicol Bolas


Besides contiguous storage and code-clarity issues, I ran into some fairly insidious errors trying to use std::string to hold raw bytes.

Most of them centered around trying to convert a char array of bytes to std::string when interfacing with C libraries. For example:

std::string password = "pass\0word";
std::cout << password.length() << std::endl; // prints 4, not 9

Maybe you can fix that by specifying the length:

std::string password("pass\0word", 0, 9);
std::cout << password.length() << std::endl; // nope! still 4!

This is probably because the constructor expects to receive a C-string, not a byte array. There might be a better way, but I ended up with this:

std::string password("pass0word", 0, 9);
password[4] = '\0';
std::cout << password.length() << std::endl; // hurray! 9!

A little clunky. Thankfully I found this in unit testing, but I would have missed it if my test vectors didn't have null bytes. What makes this insidious is that the second approach above will work fine until the array contains a null byte.

So far std::vector<uint8_t> looks like a good option (thanks J.N. and Hurkyl):

char p[] = "pass\0word";
std::vector<uint8_t> password(p, p, p+9); // :)

Note: I haven't tried the iterator constructor with std::string, but this error is easy enough to make that it might be worth avoiding even the possibility.

Lessons learned:

  • Test byte-handling methods witih null byte-containing test vectors.
  • Be careful when (and I would say avoid) using std::string to hold raw bytes.
like image 25
jtpereyda Avatar answered Oct 21 '22 02:10

jtpereyda