What is the deciding factor for classifying a file into Binary or Text file?
E.g: Consider the below C program
NOTE: Before running the program make sure binary.txt doesnt exist.
Observation:
File created "binary.txt" with contents TEXTFILE
#include <stdio.h>
int main()
{
int arr[2] = {1415071060,1162627398};
FILE *fp = fopen("binary.txt", "wb");
if(fp == NULL)
{
printf("Error opening file\n");
exit(1);
}
fwrite(arr, sizeof(arr), 1, fp);
fclose(fp);
return 0;
}
However only creator knows that it is created in binary mode and this should be called binary file.
Anyone who opens the file "binary.txt" think its text file.
What a general user should call this file - Binary or Text file?
@JohnBollinger summarized it best in a comment.
text vs. binary is not a fundamental file characteristic on modern operating systems, but rather a differentiation between how files are interpreted.
Let's say a file contains four bytes with the following hex values of the bytes:
0x41 0x42 0x43 0x44
If you interpret those bytes as characters in a system that uses ASCII encoding, you will get the characters ABCD
.
If you treat those bytes as a 4-byte integer, you will get the value 0x41424344
(1094861636 in decimal) in a big endian system and 0x44434241
(1145258561 in decimal) in a little endian system.
As far as the computer is concerned, it's all binary. As to what they mean, it's all a matter of intepretation.
On modern operating systems, there is no distinction at the file system level between text files and binary files. On legacy systems, the C library implements a series of tricks to translate newlines between OS specific representations (such as 0x0D
0x0A
) and the single byte representation '\n'
for the C program reading the file in text mode. This compatibility layer must not be used when dealing with actual binary contents, for which the b
option must be used in fopen()
.
Older operating systems used to have different representations for text and binary files, but most of these are obsolete nowadays.
Conversely, many file systems keep track of executable files with some specific information such as mode bits on Unix FS. These executable files can be binary, containing one form or another of executable code, while others are text files containing scripts.
In your example, whether the file should be seen as binary or text is a matter of intent. If the creator of the file intended for is to be read as binary, naming it binary.txt
is confusing as the filename extension .txt
is routinely used to indicate generic text files. sample.bin
would be much more obvious.
How to interpret the contents of a file is important for programmers and casual users: on legacy systems, loading and save a file as text may change its contents, unless you use tools that are terminally anal about preserving contents.
For example qemacs, a programmer's editor inspired by emacs, makes extensive efforts upon loading a file to determine the best mode for displaying and editing the contents:
If the file is written back without modifications, the contents are preserved so binary files that happen to have textual contents are unmodified. Otherwise, the above tests determine the correct conventions for encoding new contents.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With