Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary files and cross platform compatibility

I have written a C++ library that saves my data (a collection of custom structs etc) into a binary file. I currently use (i.e. create and consume) the files locally, on my Windows (XP) machine. For simplicity, lets think of the library in two parts: a writer (Creates the files) and a reader or consumer (simply reads data from the files).

Recently though, I would like to also consume (i.e. read) the data files I have created on my XP machine, on my Linux machine. I must point out at this stage that both machines are PCs (so have the same endianess etc).

I can build a reader (and compile for Linux [Ubuntu 9.10 to be precise]), since I am the library creator. My question, before I embark down this road (of building the reader etc) is:

Assuming I have succesfully built the reader for Linux,

Can I simply copy accross, files that were created on the windows (XP) machine to the Linux (Ubuntu 9.10) machine and use the Linux reader to successfully read the copied over file?

like image 395
Stick it to THE MAN Avatar asked Dec 21 '09 15:12

Stick it to THE MAN


People also ask

Are binary files readable?

Binary files are not human readable and require a special program or hardware processor that knows how to read the data inside the file. Only then can the instructions encoded in the binary content be understood and properly processed.

Does Diff work on binary files?

diff determines whether a file is text or binary by checking the first few bytes in the file; the exact number of bytes is system dependent, but it is typically several thousand. If every byte in that part of the file is non-null, diff considers the file to be text; otherwise it considers the file to be binary.

What are examples of binary files?

Executable files, compiled programs, SAS and SPSS system files, spreadsheets, compressed files, and graphic (image) files are all examples of binary files.

What is a binary file transfer?

Transferring a pure binary file, such as an executable program, image or video, to a remote location. Binary file transfers maintain the integrity of all eight bits in each byte, and the file winds up at the receiving end bit for bit the same as it started.


3 Answers

For the files to be binary compatible:

  • endianness must match (as it does for you)
  • bitfield packing order must be the same
  • sizes and signedness of types must be the same
  • the compiler must make the same decisions about padding and alignment

It's certainly possible for all of these conditions to be fulfilled, or for you to not happen to be hitting any cases for which they are not. At the very least, though, I'd add some sanity checks and/or sentinel members to detect problems.

like image 74
moonshadow Avatar answered Oct 20 '22 23:10

moonshadow


Binary files should be compatible across machines with the same endianess.

The issue you may have in your code is the size of ints, you can't necessarily assume that the compiler on different OS's has the same size int. So either copy blocks of bytes and cast them, or use int16, int32 etc.

like image 31
Martin Beckett Avatar answered Oct 21 '22 01:10

Martin Beckett


Structs are not a file format, and you shouldn't try to use them as such.

When attempting to make structs work with fread and fwrite, there's a huge number of hacks to make it work. You byte-swap integers so that you can share files between little-endian and big-endian machines. You change your structs to use fixed-width integer types, so you can share between machines with different word sizes (such as between x86 and x64 machines). You add compiler-specific pragmas to control the padding of structs to share between compiler versions.

It works, but it's ugly. Not to mention, easy to get wrong.

Much like the recommendation in The byte order fallacy, a much better idea is to write code to read/write the fields individually. By writing your own code, you can ensure there's no padding, and you can choose integer sizes independently of the local size of integers, and you can support both endiannesses without byte-swapping (by reading/writing the bytes of an integer separately).

Unlike the hacky approach, this is hard to get wrong. Further, because you don't rely on any compiler or architecture specific behaviors, either your code will work on all compilers and architectures, or none. If you do it right, you shouldn't have any platform-specific bugs.

There is one downside; individually reading/writing the fields will be slower than just using fread/fwrite directly. You can set up a buffer (uint8_t buffer[]) and write the entirety of the data into it, and then write everything out at once, which might help, but it'll still be slower (because you'd still have to move the fields into the buffer one at a time), but for most purposes it'll still be fast enough (exceptions being embedded / real-time systems or extremely high performance computing).

like image 2
Kevin Avatar answered Oct 21 '22 00:10

Kevin