Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How reliable is the -B file test?

When I open a SQLite database file there is a lot of readable text in the beginning of the file - how big is the chance that a SQLite file is filtered wrongly away due the -B file test?

#!/usr/bin/env perl
use warnings;
use strict;
use 5.10.1;
use File::Find;

my $dir = shift;
my $databases;

find( {
    wanted     => sub {
        my $file = $File::Find::name;
        return if not -B $file;
        return if not -s $file;
        return if not -r $file;
        say $file;
        open my $fh, '<', $file or die "$file: $!";
        my $firstline = readline( $fh ) // '';
        close $fh or die $!;
        push @$databases, $file if $firstline =~ /\ASQLite\sformat/;
    },
    no_chdir   => 1,
},
$dir );

say scalar @$databases;
like image 611
sid_com Avatar asked Jan 11 '13 17:01

sid_com


People also ask

Is there a way to check if a file is safe?

If you found the .exe you want to scan in the Windows task manager and you're not sure of its location, then right click it and choose “open file location”. The file should then automatically be highlighted. Now right click the file once and scan it. If it's marked as safe, then it's probably safe to be on your PC.


2 Answers

The perlfunc man page has the following to say about -T and -B:

The -T and -B switches work as follows. The first block or so of the file is
examined for odd characters such as strange control codes or characters with
the high bit set. If too many strange characters (>30%) are found, it's a -B
file; otherwise it's a -T file. Also, any file containing a zero byte in the
first block is considered a binary file. 

Of course you could now do a statistic analysis of a number of sqlite files, parse their "first block or so" for "odd characters", calculate the probability of their occurrence, and that would give you an idea of how likely it is that -B fails for sqlite files.

However, you could also go the easy route. Can it fail? Yes, it's a heuristic. And a bad one at that. So don't use it.

File type recognition on Unix is usually done by evaluating the file's content. And yes, there are people who've done all the work for you already: it's called libmagic (the thingy that yields the file command line tool). You can use it from Perl with e.g. File::MMagic.

like image 81
Moritz Bunkus Avatar answered Sep 18 '22 03:09

Moritz Bunkus


Well, all files are technically a collection of bytes, and thus binary. Beyond that, there is no accepted definition of binary, so it's impossible to evaluate -B's reliability unless you care to posit a definition by which it is to be evaluated.

like image 30
ikegami Avatar answered Sep 20 '22 03:09

ikegami