As the title suggests I would like to grep a reasonably large (about 100MB) binary file, for a binary string - this binary string is just under 5K.
I've tried grep using the -P option, but this only seems to return matches when the pattern is only a few bytes - when I go up to about 100 bytes it no longer finds any matches.
I've also tried bgrep. This worked well originally, however, when I needed to extend the pattern to the length I have now I just get "invalid/empty search string" errors.
The irony is, in Windows I can use HxD to search the file and I finds it in a instance. What I really need though is a Linux command line tool.
Thanks for your help,
Simon
Say we have a couple of big binary data files. For a big one that shouldn't match, we create a 100MB file whose contents are all NUL bytes.
dd ibs=1 count=100M if=/dev/zero of=allzero.dat
For the one we want to match, create a hundred random megabytes.
#! /usr/bin/env perl
use warnings;
binmode STDOUT or die "$0: binmode: $!";
for (1 .. 100 * 1024 * 1024) {
print chr rand 256;
}
Execute it as ./mkrand >myfile.dat
.
Finally, extract a known match into a file named pattern
.
dd skip=42 count=10 if=myfile.dat of=pattern
I assume you want only the files that match (-l
) and want your pattern to be treated literally (-F
or --fixed-strings
). I suspect you may have been running into a length limit with -P
.
You may be tempted to use the --file=PATTERN-FILE
option, but grep
interprets the contents of PATTERN-FILE as newline-separated patterns, so in the likely case that your 5KB pattern contains newlines, you'll hit an encoding problem.
So hope your system's ARG_MAX
is big enough and go for it. Be sure to quote the contents of pattern
. For example:
$ grep -l --fixed-strings "$(cat pattern)" allzero.dat myfile.dat myfile.dat
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With