Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Proper" way to store binary data with C++/STL

In general, what is the best way of storing binary data in C++? The options, as far as I can tell, pretty much boil down to using strings or vector<char>s. (I'll omit the possibility of char*s and malloc()s since I'm referring specifically to C++).

Usually I just use a string, however I'm not sure if there are overheads I'm missing, or conversions that STL does internally that could mess with the sanity of binary data. Does anyone have any pointers (har) on this? Suggestions or preferences one way or another?

like image 549
Sean Edwards Avatar asked Jan 13 '09 22:01

Sean Edwards


People also ask

How do you store binary data?

Binary data can be stored in a table using the data type bytea or by using the Large Object feature which stores the binary data in a separate table in a special format and refers to that table by storing a value of type oid in your table.

How does CPP store binary data?

To write a binary file in C++ use write method. It is used to write a given number of bytes on the given stream, starting at the position of the "put" pointer. The file is extended if the put pointer is currently at the end of the file.

Which data type are used for storing data in binary format?

Store raw-byte data, such as IP addresses, up to 65000 bytes. Data types BINARY and BINARY VARYING ( VARBINARY ) are collectively referred to as binary string types and the values of binary string types are referred to as binary strings. A binary string is a sequence of octets or bytes.


1 Answers

vector of char is nice because the memory is contiguious. Therefore you can use it with a lot of C API's such as berkley sockets or file APIs. You can do the following, for example:

  std::vector<char> vect;   ...   send(sock, &vect[0], vect.size()); 

and it will work fine.

You can essentially treat it just like any other dynamically allocated char buffer. You can scan up and down looking for magic numbers or patters. You can parse it partially in place. For receiving from a socket you can very easily resize it to append more data.

The downside is resizing is not terribly efficient (resize or preallocate prudently) and deletion from the front of the array will also be very ineficient. If you need to, say, pop just one or two chars at a time off the front of the data structure very frequently, copying to a deque before this processing may be an option. This costs you a copy and deque memory isn't contiguous, so you can't just pass a pointer to a C API.

Bottom line, learn about the data structures and their tradeoffs before diving in, however vector of char is typically what I see used in general practice.

like image 106
Doug T. Avatar answered Sep 26 '22 14:09

Doug T.