Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inserters and Extractors reading/writing binary data vs text

I've been trying to read up on iostreams and understand them better. Occasionally I find it stressed that inserters (<<) and extractors (>>) are meant to be used in textual serialization. It's a few places, but this article is a good example:

http://spec.winprog.org/streams/

Outside of the <iostream> universe, there are cases where the << and >> are used in a stream-like way yet do not obey any textual convention. For instance, they write binary encoded data when used by Qt's QDataStream:

http://doc.qt.nokia.com/latest/qdatastream.html#details

At the language level, the << and >> operators belong to your project to overload (hence what QDataStream does is clearly acceptable). My question would be whether it is considered a bad practice for those using <iostream> to use the << and >> operators to implement binary encodings and decodings. Is there (for instance) any expectation that if written to a file on disk that the file should be viewable and editable with a text editor?

Should one always be using other method names and base them on read() and write()? Or should textual encodings be considered merely a default behavior that classes integrating with the standard library iostream can elect to ignore?


UPDATE A key terminology issue on this seems to be the distinction of I/O that is "formatted" vs "unformatted" (as opposed to the terms "textual" vs "binary"). I found this question:

writing binary data (std::string) to an std::ofstream?

It has a comment from @TomalakGeret'kal saying "I'd not want to use << for binary data anyway, as my brain reads it as "formatted output" which is not what you're doing. Again, it's perfectly valid, but I just would not confuse my brain like that."

The accepted answer to the question says it's fine as long as you use ios::binary. That seems to bolster the "there's nothing wrong with it" side of the debate...but I still don't see any authoritative source on the issue.

like image 521
HostileFork says dont trust SE Avatar asked Nov 22 '11 17:11

HostileFork says dont trust SE


2 Answers

Actually the operators << and >> are bit shift operators; using them for I/O is strictly speaking already a misuse. However that misuse is about as old as operator overloading itself, and I/O today is the most common usage of them, therefore they are widely regarded as I/O insertion/extraction operators. I'm pretty sure if there weren't the precedent of iostreams, nobody would use those operators for I/O (especially with C++11 which has variadic templates, solving the main problem which using those operators solved for iostreams, in a much cleaner way). On the other hand, from the language point of view, overloaded operator<< and operator>> can mean whatever you want them to mean.

So the question boils down to what would be an acceptable use of those operators. For this, I think one has to distinguish two cases: First, new overloads working on iostream classes, and second, new overloads working on other classes, possibly designed to work like iostreams.

Let's consider first new operators on iostream classes. Let me start with the observation that the iostream classes are all about formatting (and the reverse process, which could be called "deformatting"; "lexing" IMHO wouldn't be quite the right term here because the extractors don't determine the type, but only try to interpret the data according to the type given). The classes responsible for the actual I/O of raw data are the streambufs. However note that a proper binary file is not a file where you just dump internal raw data. Just like a text file (actually even more so), a binary file should have a well-specified encoding of the data it contains. Especially if the files are expected to be read on different systems. Therefore the concept of formatted output makes perfect sense also for binary files; just the formatting is different (e.g. writing a pre-determined number of bytes with the most significant one first for an integer value).

The iostreams themselves are classes which are intended to work on text files, that is, on files whose content is interpreted as textual representation of data. A lot of built-in behaviour is optimized for that, and may cause problems if used on binary files. An obvious example is that by default spaces are skipped before any input is attempted. For a binary file, this would be clearly the wrong behaviour. Also the use of locales doesn't make sense for binary files (although one might argue that there could be a "binary locale", but I don't think locales as defined for iostreams provide a suitable interface for that). Therefore I'd say writing binary operator<< or operator>> for iostream classes would be wrong.

The other case is where you define a separate class for binary input/output (possibly reusing the streambuf layer for doing the actual I/O). Since we are now speaking about different classes, the argumentation above doesn't apply any more. So the question now is: Should operator<< and operator>> on I/O be regarded as "text insertion/extraction operators" or more generally as "formatted data insertion/extraction operators"? The standard classes only use them for text, but then, there are no standard classes for binary I/O insertion/extraction at all, so the standard usage cannot distinguish between the two.

I personally would say that binary insertion/extraction is close enough to textual insertion/extraction that this usage is justified. Note that you also could make meaningful binary I/O manipulators, e.g. bigendian, littleendian and intwidth(n) to determine the format in which integers are to be output.

Beyond that there's also the use of those operators for things which are not really I/O (and where you wouldn't even think of using the streambuf layer), like reading from or inserting into a container. In my opinion, that already constitutes misuse of the operators, because there the data isn't translated into or out of a different format. It is just stored in a container.

like image 127
celtschk Avatar answered Nov 09 '22 21:11

celtschk


The abstraction of the iostreams in the standard is that of a textually formatted stream of data; there is no support for any non-text format. That is the abstraction of iostreams. There's nothing wrong about defining a different stream class whose abstraction is a binary format, but doing so in an iostream will likely break existing code, and not work.

like image 21
James Kanze Avatar answered Nov 09 '22 22:11

James Kanze