Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do Binary Files works? (From c++'s point of view) [closed]

i have some missunderstandings about Binary files, i dont understand what a binary file is, i know text files are also binary files, but it needs to be parsed in order to extract information, unlike text files binary files with the same contents looks diffrent, for example while storing my name in a binary file "Rishabh" it not only stores Rishabh in that file but with some extra unreadable characters, what is it?? Why does'nt it only store characters like a text file, and what are binary file formats, eg. .3d, .zip, .mp3 etc... From my knowledge in text files, format extension specifies what the format is or how to process that file, like .dae, .xml, .htm etc... These contains tags to store datas, but what about binary files, because it dont needs any tags because its stored as a variable in that file from which we have to copy contents to the programs variables, (i mean to say its like stored in memory) so why these binary file formats are diffrent, why just not only a single program read all the contents of the file which is unkown to the world and to me?? And what is binary file format cracking??

like image 668
Rishabh Avatar asked Nov 30 '25 06:11

Rishabh


2 Answers

All files have some kind of pre-determined encoding since computers can't store anything but bit-patterns in bytes on disk. A text file contains only the encoding for printable characters plus space, and few other encodings to end-a-line, tab, and maybe form-feed and a few others related to character display on a device. Because the encoding in a text file is a well-known standard, and is quite common, there are functions in most, if not all languages, to deal specifically with that type of file. Most importantly, they know how to read a line at a time - they recognize line-terminator character(s).

If however, you type the characters of your name in some other program besides a text editor - say you write using the text tool in Gimp or Microsoft Paint, and then save it. The program has to save more information than just your name. Your name has a position on a canvas that must be saved. It also has a font and a size and whether it is bold or italic or underlined, that need to be saved. The size of the canvas needs to be saved. The color being used, even if white and black, needs to be saved. This encoding will be different than the encoding used to save the letters of your name. So if you edit the file with a text editor, you will see some gibberish since the text editor is expecting character encoding and knows nothing about the encoding Gimp uses for fonts, font sizes, x,y positions, etc.

C++ compilers are not written with routines to understand any binary file encodings. The routines for reading/writing binary files in C++ will just read and write sequences of bytes. Although, since the fundamental type that holds a byte of data in C++ is a char (or unsigned char), you will see binary prototypes like

write ( char * buffer, streamsize size );
 read ( char * buffer, streamsize size );

But the char pointer in this case should be considered as a "byte *" since the read/write functions are just moving bytes of data from/to disk or memory without any regard for character encodings.

C++ read/write routines don't know, or care what the format or encoding is for the bytes they are moving. So it is left up to the programmer to write code to process or handle these bytes according to the pre-defined format for the file. However, the routines written to process a specific format of binary file can be compiled into a library that can then be shared or sold, and used by many C++ programmers. For example, LibXL can be used to read the binary format of Excel files from a C++ program.

like image 92
Scooter Avatar answered Dec 01 '25 21:12

Scooter


From the perspective of C/C++, the only difference between text and binary files is how line endings are handled.

If you open a file in binary mode, then read reads exactly the bytes in the file, and write writes exactly the bytes which are in memory.

If you open a file in text mode, then whatever character or character sequence is conventionally used to represent the end of a line in a file is transformed into some single character (which is written in the source code as '\n', although it is only one character) when the file is read, and the \n is transformed into the conventional end-of-line character or sequence when the file is written to. Also, it is not technically legal for the file to not end with an end-of-line sequence, and there may be a limit to the length of a line.

In Unix, the two modes are identical, because \n is a representation of the character code 10 (0A in hex), and that is precisely the conventional line-ending character. In Windows, by contrast, the conventional line-ending sequence is two bytes long -- {13,10} or {0D,0A}. \n is still 0A, so effectively the 0D preceding the 0A is deleted from the data read from the file, and an 0D is inserted before every 0A when data is written to the file.

Some (much) older operating systems had no conventional line-ending character. Instead, all lines were padded with space characters to the exact same length, making it possible to directly seek to a specific line number. C libraries working in text mode would typically read exactly the line length, and then delete the trailing spaces (if any) and finally add the code corresponding to \n (some such systems used EBCDIC instead of ASCII, so \n was a different integer value). Writing the data out, the \n would be deleted and replaced with exactly the correct number of spaces to bring the line to the standard length. Fortunately, those of us who don't work in a computing museum don't have to deal with that stuff any more, and Apple abandoned its use of 0D as the line-end character with the advent of OSX, so the text/binary difference is now limited to Windows.

like image 26
rici Avatar answered Dec 01 '25 19:12

rici