Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to list all binary file extensions within a directory tree?

I need to build a list of all the file extensions of binary files located within a directory tree.

The main question would need to be how to distinguish a text file from a binary one, and the rest should be cake.

EDIT: This is the closest I got, any better ideas?

find . -type f|xargs file|grep -v text|sed -r 's:.*\.(.*)\:.*:\1:g'
like image 430
dukeofgaming Avatar asked Mar 21 '12 21:03

dukeofgaming


4 Answers

Here's a trick to find the binary files:

grep -r -m 1 "^"  <Your Root> | grep "^Binary file"

The -m 1 makes grep not read all the file.

like image 97
Eran Ben-Natan Avatar answered Oct 17 '22 07:10

Eran Ben-Natan


This perly one-liner worked for me, it was also quite fast:

find . -type f -exec perl -MFile::Basename -e 'print (-T $_ ? "" : (fileparse ($_, qr/\.[^.]*/))[2] . "\n" ) for @ARGV' {} + | sort | uniq

and this is how you can find all binary files in the current folder:

find . -type f -exec perl -e 'print (-B $_ ? "$_\n" : "" ) for @ARGV' {} +

-T is a test for text files, and -B for binary, and they are opposites of each other*.

*perl file tests doc

like image 43
Bijou Trouvaille Avatar answered Oct 17 '22 05:10

Bijou Trouvaille


There is no difference between a binary file and a text file on Linux. The file utility looks at the contents and guesses. Unfortunately, it's not of much help because file doesn't produce a simple "binary or text" answer; it has a complex output with a large number of cases that you would have to parse.

One approach is to read some fixed-sized prefix of a file, like say 256 bytes, and then apply some heuristics. For instance, are all the byte values 0x0 to 0x7F, avoiding control codes except for common whitespace? That suggests ASCII? If there are bytes 0x80 through 0xFF, does the entire buffer (except for one code at the end which may be chopped) decode as valid UTF-8? Etc.

One idea might be to sneakily exploit utilities which detect binary files, like GNU diff.

$ diff -r /bin/ls <(echo foo)
Binary files /bin/ls and /dev/fd/63 differ

Without process substitution, still works:

$ diff -r /bin/ls /dev/null
Binary files /bin/ls and /dev/null differ

Now just grep the output of that and look for the word Binary.

The question is whether diff's heuristic for binary files works for your purposes.

like image 29
Kaz Avatar answered Oct 17 '22 05:10

Kaz


There is no sure way to differentiate a "text" file from a "binary" file, it is guess work.

#!/bin/bash
guess=`echo \`head -c 4096 $1 | strings -a -n 1 | wc -c \`  '* 1.05 /'  \`head -c 4096 $1 |  wc -c \` | bc `;
if [ $guess -eq 1 ] ; then
    echo $1 "is text file"
    exit 0
else
    echo $1 "is binary file"
    exit 1
fi
like image 1
pizza Avatar answered Oct 17 '22 06:10

pizza