Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grepping for a large binary value from an even larger binary file

As the title suggests I would like to grep a reasonably large (about 100MB) binary file, for a binary string - this binary string is just under 5K.

I've tried grep using the -P option, but this only seems to return matches when the pattern is only a few bytes - when I go up to about 100 bytes it no longer finds any matches.

I've also tried bgrep. This worked well originally, however, when I needed to extend the pattern to the length I have now I just get "invalid/empty search string" errors.

The irony is, in Windows I can use HxD to search the file and I finds it in a instance. What I really need though is a Linux command line tool.

Thanks for your help,

Simon

like image 800
Simon Avatar asked Nov 04 '22 18:11

Simon


1 Answers

Say we have a couple of big binary data files. For a big one that shouldn't match, we create a 100MB file whose contents are all NUL bytes.

dd ibs=1 count=100M if=/dev/zero of=allzero.dat

For the one we want to match, create a hundred random megabytes.

#! /usr/bin/env perl

use warnings;

binmode STDOUT or die "$0: binmode: $!";

for (1 .. 100 * 1024 * 1024) {
  print chr rand 256;
}

Execute it as ./mkrand >myfile.dat.

Finally, extract a known match into a file named pattern.

dd skip=42 count=10 if=myfile.dat of=pattern

I assume you want only the files that match (-l) and want your pattern to be treated literally (-F or --fixed-strings). I suspect you may have been running into a length limit with -P.

You may be tempted to use the --file=PATTERN-FILE option, but grep interprets the contents of PATTERN-FILE as newline-separated patterns, so in the likely case that your 5KB pattern contains newlines, you'll hit an encoding problem.

So hope your system's ARG_MAX is big enough and go for it. Be sure to quote the contents of pattern. For example:

$ grep -l --fixed-strings "$(cat pattern)" allzero.dat myfile.dat
myfile.dat
like image 106
Greg Bacon Avatar answered Nov 15 '22 05:11

Greg Bacon