Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly is the textual representation of Binary data?

Tags:

text

binary

Sometimes when you download a compiled binary file with the wrong mime type or for example running the "more" command on a binary file you get a bunch of "garbly gook" for lack of a better term.

For example this a snippet of what I see when I run "more" from the command line on a very simple C program compiled with gcc on OS X.

<94>^^^@^@ESC^@^@^@^^^A^@^@<A8>^^^@^@.^@^@^@^N^D^@^@^P ^@^@@^@^@^@^O^D^@^@^L ^@^@H^@^@^@^O^D^@^@^H ^@^@P^@^@^@^O
^D^@^@^@ ^@^@\^@^@^@^C^@^P^@^@^P^@^@p^@^@^@^O^A^@^@b^_^@^@y^@^@^@^O^D^@^@^D ^@^@<82>^@^@^@^O^A^@^@<B6>^^^@^@<88>
^@^@^@^O^A^@^@T^_^@^@<8D>^@^@^@^O^A^@^@T^^^@^@<93>^@^@^@^A^@^A^B^@^@^@^@<99>^@^@^@^A^@^A^B^@^@^@^@^L^@^@^@^M^@^@
^@ ^@dyld_stub_binding_helper^@__dyld_func_lookup^@dyld__mach_header^@_NXArgc^@_NXArgv^@___progname^@__mh_execute
_header^@_average^@_environ^@_main^@_sum^@start^@_exit^@_printf^@^@^@^@

Can someone explain in simple terms why this is? What is happening when a text editor or the plain text mime type is trying to interpret binary data? Does the ^@ mean anything in this context? Why is there some text and some garbly gook? Is there any standard for the way this binary data is represented in text? Why is it not simple 1 and 0s?

I can conceptually understand ascii or unicode as a representation of characters in a number system that can be reduced down to binary 1's and 0 and a number system that the CPU understands. But at a higher level I am trying to get my head around what binary data is. I guess I want to "see the abstraction", if that makes sense.

Is there a way to "see" binary data in any kind of meaningful way in a text editor?

like image 556
Gordon Potter Avatar asked Dec 08 '22 06:12

Gordon Potter


2 Answers

Binary files and text files are all the same thing for a computer, after all they are all 0's and 1's. The way that you see the content of the file depends on the program you use to view it.
Text editors (try to) interpret the 0's and 1's into characters, and show you the characters they get, which you can view as a document. They make an assumption that the files you are giving them are text files, containing ASCII characters. However this is not true for computer files in general, as they can contain any kind of binary data, which is not necessarily ASCII characters. When this happens, instead of giving you an error message, some text editors give you an ugly and incorrect representation of the data in the file (as they do not understand the data anyway).
Hex editors are more of a tool for geeks, as they also give you the computer data in hex (a more readable format compared to binary). Some hex editors also give you the ASCII characters they detect, so it's event more convenient.
Alex gave you a very cool command line tool, but if you want some GUI a quick google with "hex editor" will give you too many softwares to try.

like image 170
phunehehe Avatar answered Jun 05 '23 16:06

phunehehe


There really isn't a significant difference between text and binary files, save for the range of values used within the files. Each value is converted to a character (in a basic text editor) based on the code page used (ASCII, ANSI).

You're seeing the character "^@" because the value of the byte in the file at that position is 0 (the nul character). The nul character is not printable, and so the more program is displaying it using caret notation.

You can open the file in a hex editor, which is a text editor that is more sensitive to binary data. I am not very familiar with Mac software, but a free hex editor can be downloaded at http://hexedit.sourceforge.net/.

Basic text editors/viewers assume that anything you open with it is meant to be read as plain text.

EDIT: Incorporated Mike Spross's corrections re: ^@.

like image 26
David Andres Avatar answered Jun 05 '23 17:06

David Andres