Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary version of iostream

I've been writing a binary version of iostreams. It essentially allows you to write binary files, but gives you much control over the format of the file. Example usage:

my_file << binary::u32le << my_int << binary::u16le << my_string;

Would write my_int as a unsigned 32-bit integer, and my_string as a length-prefixed string (where the prefix is u16le.) To read the file back, you would flip the arrows. Works great. However, I hit a bump in the design, and I'm still on the fence about it. So, time to ask SO. (We make a couple of assumptions, such as 8-bit bytes, 2s-complement ints, and IEEE floats at the moment.)

iostreams, under the hood, use streambufs. It's a fantastic design really -- iostreams code the serialization of an 'int' into text, and let the underlying streambuf handle the rest. Thus, you get cout, fstreams, stringstreams, etc. All of these, both the iostreams and the streambufs, are templated, usually on char, but sometimes also as a wchar. My data, however, is a byte stream, which best represented by 'unsigned char'.

My first attempts were to template the classes based on unsigned char. std::basic_string templates well enough, but streambuf does not. I ran into several problems with a class named codecvt, which I could never get to follow the unsigned char theme. This raises two questions:

1) Why is a streambuf responsible for such things? It seems like code-conversions lie way out of a streambuf's responsibility -- streambufs should take a stream, and buffer data to/from it. Nothing more. Something as high level as code conversions feels like it should belong in iostreams.

Since I couldn't get the templated streambufs to work with unsigned char, I went back to char, and merely casted data between char/unsigned char. I tried to minimize the number of casts, for obvious reasons. Most of the data basically winds up in a read() or write() function, which then invoke the underlying streambuf. (And use a cast in the process.) The read function is basically:

size_t read(unsigned char *buffer, size_t size)
{
    size_t ret;
    ret = stream()->sgetn(reinterpret_cast<char *>(buffer), size);
    // deal with ret for return size, eof, errors, etc.
    ...
}

Good solution, bad solution?


The first two questions indicate that more info is needed. First, projects such as boost::serialization were looked at, but they exist at a higher level, in that they define their own binary format. This is more for reading/writing at a lower level, where it is wished to define the format, or the format is already defined, or the bulk metadata is not required or desired.

Second, some have asked about the binary::u32le modifier. It is an instantiation of a class that holds the desired endianness and width, at the moment, perhaps signed-ness in the future. The stream holds a copy of the last-passed instance of that class, and used that in serialization. This was a bit of a workaround, I orginally tried overloading the << operator thusly:

bostream &operator << (uint8_t n);
bostream &operator << (uint16_t n);
bostream &operator << (uint32_t n);
bostream &operator << (uint64_t n);

However at the time, this didn't seem to work. I had several problems with ambiguous function call. This was especially true of constants, although you could, as one poster suggested, cast or merely declare it as a const <type>. I seem to remember that there was some other larger problem however.

like image 518
Thanatos Avatar asked Jul 19 '09 20:07

Thanatos


People also ask

What does iostream mean in c++?

iostream: iostream stands for standard input-output stream. This header file contains definitions of objects like cin, cout, cerr, etc. iomanip: iomanip stands for input-output manipulators.

What library iostream?

The iostream library is an object-oriented library that provides input and output functionality using streams. A stream is an abstraction that represents a device on which input and ouput operations are performed. A stream can basically be represented as a source or destination of characters of indefinite length.


1 Answers

I agree with legalize. I needed to do almost exactly what you're doing, and looked at overloading << / >>, but came to the conclusion that iostream was just not designed to accommodate it. For one thing, I didn't want to have to subclass the stream classes to be able to define my overloads.

My solution (which only needed to serialize data temporarily on a single machine, and therefore did not need to address endianness) was based on this pattern:

// deducible template argument read
template <class T>
void read_raw(std::istream& stream, T& value,
    typename boost::enable_if< boost::is_pod<T> >::type* dummy = 0)
{
    stream.read(reinterpret_cast<char*>(&value), sizeof(value));
}

// explicit template argument read
template <class T>
T read_raw(std::istream& stream)
{
    T value;
    read_raw(stream, value);
    return value;
}

template <class T>
void write_raw(std::ostream& stream, const T& value,
    typename boost::enable_if< boost::is_pod<T> >::type* dummy = 0)
{
    stream.write(reinterpret_cast<const char*>(&value), sizeof(value));
}

I then further overloaded read_raw/write_raw for any non-POD types (e.g. strings). Note that only the first version of read_raw need be overloaded; if you use ADL correctly, the second (1-arg) version can call 2-arg overloads defined later and in other namespaces.

Write example:

int32_t x;
int64_t y;
int8_t z;
write_raw(is, x);
write_raw(is, y);
write_raw<int16_t>(is, z); // explicitly write int8_t as int16_t

Read example:

int32_t x = read_raw<int32_t>(is); // explicit form
int64_t y;
read_raw(is, y); // implicit form
int8_t z = numeric_cast<int8_t>(read_raw<int16_t>(is));

It's not as sexy as overloaded operators, and things don't fit on one line as easily (which I tend to avoid anyway, since debug breakpoints are line-oriented), but I think it turned out simpler, more obvious, and not much more verbose.

like image 122
Trevor Robinson Avatar answered Sep 19 '22 20:09

Trevor Robinson