Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly causes binary file "gibberish"?

Tags:

I haven't found an answer to this particular question; perhaps there isn't one. But I've been wondering for a while about it.

What exactly causes a binary file to display as "gibberish" when you look at it in a text editor? It's the same thing with encrypted files. Are the binary values of the file trying to be converted into ASCII? Is it possible to convert the view to display raw binary values, i.e. to show the 1s and 0s that make up the file?

Finally, is there a way to determine what program will properly open a data file? Many times, especially with Windows, a file is orphaned or otherwise not associated w/ a particular program. Opening it in a text editor sometimes tells you where it belongs but most of the time doesn't, due to the gibberish. If the extension doesn't provide any information, how can you determine what program it belongs to?

like image 518
crystalattice Avatar asked Oct 19 '08 05:10

crystalattice


People also ask

Can a binary file be decoded?

Binary files are not human readable and require a special program or hardware processor that knows how to read the data inside the file. Only then can the instructions encoded in the binary content be understood and properly processed. The following screenshot shows part of the content from a file on a Mac computer.

What happens when you open a binary file in a text editor?

These files are not human readable. Thus, trying to open a binary file using a text editor will show some garbage values. We need specific software to read or write the contents of a binary file. Binary files are stored in a computer in a sequence of bytes.

Why do binary files exist?

Binary files are designed to provide the most control over the organization of your data for both reading and writing.

What is the encoding of a binary file?

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary data (such as email or NNTP) or is not 8-bit clean.


2 Answers

  • Are the binary values of the file trying to be converted into ASCII?

Yes, that's exactly what's happening. Typically, the binary values of the file also include ASCII control characters that aren't printable, resulting in even more bizarre display in a typical text editor.

  • Is it possible to convert the view to display raw binary values, i.e. to show the 1s and 0s that make up the file?

It depends on your editor. What you want is a "hex editor", rather than a normal text editor. This will show you the raw contents of the file (typically in hexadecimal rather than binary, since the zeros and ones would take up a lot of space and be harder to read).

  • Finally, is there a way to determine what program will properly open a data file?

There is a Linux command-line program called "file" that will attempt to analyze the file (typically looking for common header patterns) and tell you what sort of file it is (for example text, or audio, or video, or XML, etc). I'm not sure if there is an equivalent program for Windows. Of course, the output of this program is just a guess, but it can be very useful when you don't know what the format of a file is.

like image 69
Ross Avatar answered Sep 18 '22 10:09

Ross


A binary file appears as gibberish because the data in it is designed for the machine to read and not for humans. Sadly, some of us get used to interpreting gibberish - albeit with somewhat specialized tools to help see the data better - but most people should not need to know.

Each byte in the file is treated as a character in the current code set (probably CP1252 on Windows). Byte value 65 is 'A', for example; you can find illustrative examples easily on the web. So, the bytes that make up the binary data are displayed according to the code set - as best as the text editor can. It doesn't try to convert the binary - it doesn't know how (only the original program does).

As to how to detect what program created the file - you may be able to do that sometimes, but not easily and reliably. On Unix (or with Cygwin on Windows) the 'file' program may be able to help. This program looks at the first few bytes to try and guess the program.

Encrypted data is supposed to look like gibberish. If it doesn't look like gibberish, then it probably isn't very well encrypted.

like image 38
Jonathan Leffler Avatar answered Sep 17 '22 10:09

Jonathan Leffler