Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Binary file is not a Text file and all Text files are Binary files?

Tags:

c

What is the deciding factor for classifying a file into Binary or Text file?

E.g: Consider the below C program

  1. Create file in binary mode
  2. Write two integers into file "binary.txt".

NOTE: Before running the program make sure binary.txt doesnt exist.

Observation:

File created "binary.txt" with contents TEXTFILE

#include <stdio.h>

int main()
{
   int arr[2] = {1415071060,1162627398};
   FILE *fp = fopen("binary.txt", "wb");

   if(fp == NULL)
   {
       printf("Error opening file\n");
       exit(1);
   }
   fwrite(arr, sizeof(arr), 1, fp);
   fclose(fp);
   return 0;
}

However only creator knows that it is created in binary mode and this should be called binary file.

Anyone who opens the file "binary.txt" think its text file.

What a general user should call this file - Binary or Text file?

like image 234
Prathibha Avatar asked Jan 03 '23 00:01

Prathibha


2 Answers

@JohnBollinger summarized it best in a comment.

text vs. binary is not a fundamental file characteristic on modern operating systems, but rather a differentiation between how files are interpreted.

Let's say a file contains four bytes with the following hex values of the bytes:

0x41 0x42 0x43 0x44

If you interpret those bytes as characters in a system that uses ASCII encoding, you will get the characters ABCD.

If you treat those bytes as a 4-byte integer, you will get the value 0x41424344 (1094861636 in decimal) in a big endian system and 0x44434241 (1145258561 in decimal) in a little endian system.

As far as the computer is concerned, it's all binary. As to what they mean, it's all a matter of intepretation.

like image 128
R Sahu Avatar answered Jan 04 '23 12:01

R Sahu


On modern operating systems, there is no distinction at the file system level between text files and binary files. On legacy systems, the C library implements a series of tricks to translate newlines between OS specific representations (such as 0x0D 0x0A) and the single byte representation '\n' for the C program reading the file in text mode. This compatibility layer must not be used when dealing with actual binary contents, for which the b option must be used in fopen().

Older operating systems used to have different representations for text and binary files, but most of these are obsolete nowadays.

Conversely, many file systems keep track of executable files with some specific information such as mode bits on Unix FS. These executable files can be binary, containing one form or another of executable code, while others are text files containing scripts.

In your example, whether the file should be seen as binary or text is a matter of intent. If the creator of the file intended for is to be read as binary, naming it binary.txt is confusing as the filename extension .txt is routinely used to indicate generic text files. sample.bin would be much more obvious.

How to interpret the contents of a file is important for programmers and casual users: on legacy systems, loading and save a file as text may change its contents, unless you use tools that are terminally anal about preserving contents.

For example qemacs, a programmer's editor inspired by emacs, makes extensive efforts upon loading a file to determine the best mode for displaying and editing the contents:

  • binary vs: text mode (defaulting to hex display for binary)
  • line termination convention
  • character encoding
  • programming language or other specific content sensitive display options...

If the file is written back without modifications, the contents are preserved so binary files that happen to have textual contents are unmodified. Otherwise, the above tests determine the correct conventions for encoding new contents.

like image 24
chqrlie Avatar answered Jan 04 '23 12:01

chqrlie