Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Perl know a file is binary?

Tags:

perl

I know you can use the file test operator -B to test if a file is binary, but how does Perl implement this internally?

like image 240
Joseph Gordon Avatar asked May 22 '09 18:05

Joseph Gordon


People also ask

How can you tell if a file is binary?

File extensions We can usually tell if a file is binary or text based on its file extension. This is because by convention the extension reflects the file format, and it is ultimately the file format that dictates whether the file data is binary or text.

How check if file is binary Linux?

You can use file --mime-encoding | grep binary to detect if a file is a binary file. It works reliably although it can get confused by a single invalid character in a long text file.

How is data stored in binary file?

Binary files can be used to store any data; for example, a JPEG image is a binary file designed to be read by a computer system. The data inside a binary file is stored as raw bytes, which is not human readable.

Do binary files have newlines?

In binary mode, both characters can be read by your program. UNIX systems only use one character, the newline, to indicate line endings. In DOS and Windows, the end of file character is 26.


2 Answers

From perldoc -f -B:

The -T and -B switches work as follows. The first block or so of the file is examined for odd characters such as strange control codes or characters with the high bit set. If too many strange characters (>30%) are found, it’s a -B file; otherwise it’s a -T file. Also, any file containing null in the first block is considered a binary file. If -T or -B is used on a filehandle, the current IO buffer is examined rather than the first block. Both -T and -B return true on a null file, or a file at EOF when testing a filehandle. Because you have to read a file to do the -T test, on most occasions you want to use a -f against the file first, as in "next unless -f $file && -T $file".
like image 64
chuck Avatar answered Nov 15 '22 23:11

chuck


According to Chapter 11 of the book Learning Perl:

The answer is **Perl cheats**: it opens the file, looks at the first few thousand bytes, and makes an educated guess. If it sees a lot of null bytes, unusual control characters, and bytes with the high bit set, then that looks like a binary file. If there’s not much weird stuff, then it looks like text. It sometimes guesses wrong. If a text file has a lot of Swedish or French words (which may have characters represented with the high bit set, as some ISO-8859-something variant, or perhaps even a Unicode version), it may fool Perl into declaring it binary. So it’s not perfect, but if you need to separate your source code from compiled files, or HTML files from PNGs, these tests should do the trick.
like image 32
TStamper Avatar answered Nov 15 '22 23:11

TStamper