I have code that manipulates binary files using fstream with the binary flag set and using the unformatted I/O functions read and write. This works correctly on all systems I've ever used (the bits in the file are exactly as expected), but those are basically all U.S. English. I have been wondering about the potential for these bytes to be modified by a codecvt on a different system.
It sounds like the standard says using unformatted I/O behaves the same as putting characters into the streambuf using sputc/sgetc. These will lead to the overflow or underflow functions in the streambuf getting called, and it sounds like these lead to stuff going through some codecvt (e.g., see 27.8.1.4.3 in the c++ standard). For basic_filebuf the creation of this codecvt is specified in 27.8.1.1.5. This makes it look like the results will depend on what basic_filebuf.getloc() returns.
So, my question is, can I assume that a character array written out using ofstream.write on one system can be recovered verbatim using ifstream.read on another system, no matter what locale configuration either person might be using on their system? I would make the following assumptions:
If the default locale isn't guaranteed to pass through this stuff unmodified on some system configuration (I don't know, Arabic or something), then what is the best way to write binary files using C++?
If you have binary flag set, everything you write will be written to the file verbatim. No conversions. How you interpret the bytes is up to you (and possibly the locale).
One more thing: There is a possibility for breakage on different locales. If for example your data source created binary data based on locale (and format of this data would change depending on locale - this is a bad idea btw). This would cause trouble when loading data on machines with different locale. This is a design error though.
If you just use standard data types/structures that have same format/layout no matter what locale they were created in everything should be OK.
Thanks for the help. I just thought it might be helpful to post some additional information about this that wouldn't fit in a comment.
The default locale for C++ programs is always the "C" locale (http://www.cplusplus.com/reference/clibrary/clocale/setlocale/). If this is the only locale used in your program, it means the behaviour doesn't depend on the particular locale configuration of the machine that it's running on. It also means that unformatted I/O for a char does not undergo any code conversion (wchar_t might be a different story though). This means that (given the assumptions in the question) read and write should allow binary data to be recovered unmodified.
(from reading the documentation) You can globally set the application's locale to match the system default by calling setlocale(LC_ALL,""), which will mean streams constructed from that point will use the system default locale. To set it back to the "C" locale you can call setlocale(LC_ALL, "C"), which will mean this is what streams constructed in the future will use. You can also specify that the "C" local should be used for a stream that's already constructed by calling stream.imbue(locale::classic()).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With