I have written a C++ library that saves my data (a collection of custom structs etc) into a binary file. I currently use (i.e. create and consume) the files locally, on my Windows (XP) machine. For simplicity, lets think of the library in two parts: a writer (Creates the files) and a reader or consumer (simply reads data from the files).
Recently though, I would like to also consume (i.e. read) the data files I have created on my XP machine, on my Linux machine. I must point out at this stage that both machines are PCs (so have the same endianess etc).
I can build a reader (and compile for Linux [Ubuntu 9.10 to be precise]), since I am the library creator. My question, before I embark down this road (of building the reader etc) is:
Assuming I have succesfully built the reader for Linux,
Can I simply copy accross, files that were created on the windows (XP) machine to the Linux (Ubuntu 9.10) machine and use the Linux reader to successfully read the copied over file?
Binary files are not human readable and require a special program or hardware processor that knows how to read the data inside the file. Only then can the instructions encoded in the binary content be understood and properly processed.
diff determines whether a file is text or binary by checking the first few bytes in the file; the exact number of bytes is system dependent, but it is typically several thousand. If every byte in that part of the file is non-null, diff considers the file to be text; otherwise it considers the file to be binary.
Executable files, compiled programs, SAS and SPSS system files, spreadsheets, compressed files, and graphic (image) files are all examples of binary files.
Transferring a pure binary file, such as an executable program, image or video, to a remote location. Binary file transfers maintain the integrity of all eight bits in each byte, and the file winds up at the receiving end bit for bit the same as it started.
For the files to be binary compatible:
It's certainly possible for all of these conditions to be fulfilled, or for you to not happen to be hitting any cases for which they are not. At the very least, though, I'd add some sanity checks and/or sentinel members to detect problems.
Binary files should be compatible across machines with the same endianess.
The issue you may have in your code is the size of ints, you can't necessarily assume that the compiler on different OS's has the same size int. So either copy blocks of bytes and cast them, or use int16, int32 etc.
Structs are not a file format, and you shouldn't try to use them as such.
When attempting to make structs work with fread
and fwrite
, there's a huge number of hacks to make it work. You byte-swap integers so that you can share files between little-endian and big-endian machines. You change your structs to use fixed-width integer types, so you can share between machines with different word sizes (such as between x86 and x64 machines). You add compiler-specific pragmas to control the padding of structs to share between compiler versions.
It works, but it's ugly. Not to mention, easy to get wrong.
Much like the recommendation in The byte order fallacy, a much better idea is to write code to read/write the fields individually. By writing your own code, you can ensure there's no padding, and you can choose integer sizes independently of the local size of integers, and you can support both endiannesses without byte-swapping (by reading/writing the bytes of an integer separately).
Unlike the hacky approach, this is hard to get wrong. Further, because you don't rely on any compiler or architecture specific behaviors, either your code will work on all compilers and architectures, or none. If you do it right, you shouldn't have any platform-specific bugs.
There is one downside; individually reading/writing the fields will be slower than just using fread/fwrite directly. You can set up a buffer (uint8_t buffer[]
) and write the entirety of the data into it, and then write everything out at once, which might help, but it'll still be slower (because you'd still have to move the fields into the buffer one at a time), but for most purposes it'll still be fast enough (exceptions being embedded / real-time systems or extremely high performance computing).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With