Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"grep" offset of ascii string from binary file

I'm generating binary data files that are simply a series of records concatenated together. Each record consists of a (binary) header followed by binary data. Within the binary header is an ascii string 80 characters long. Somewhere along the way, my process of writing the files got a little messed up and I'm trying to debug this problem by inspecting how long each record actually is.

This seems extremely related, but I don't understand perl, so I haven't been able to get the accepted answer there to work. The other answer points to bgrep which I've compiled, but it wants me to feed it a hex string and I'd rather just have a tool where I can give it the ascii string and it will find it in the binary data, print the string and the byte offset where it was found.

In other words, I'm looking for some tool which acts like this:

tool foobar filename 

or

tool foobar < filename 

and its output is something like this:

foobar:10 foobar:410 foobar:810 foobar:1210 ... 

e.g. the string which matched and a byte offset in the file where the match started. In this example case, I can infer that each record is 400 bytes long.

Other constraints:

  • ability to search by regex is cool, but I don't need it for this problem
  • My binary files are big (3.5Gb), so I'd like to avoid reading the whole file into memory if possible.
like image 988
mgilson Avatar asked Jan 03 '13 14:01

mgilson


People also ask

Does grep work on binary files?

If type is ' text ', grep processes binary data as if it were text; this is equivalent to the -a option. When type is ' binary ', grep may treat non-text bytes as line terminators even without the -z ( --null-data ) option. This means choosing ' binary ' versus ' text ' can affect whether a pattern matches a file.

What is offset in binary file?

The offset indicates the number of bytes forward or backward from the base. For a binary file, the positionfile() function always positions the file to the beginning of a record. If you specify the offset clause, 4GL adjusts the file position to the beginning of the record containing the specified byte number.

Why does grep think my file is binary?

As this answer notes, there are two cases where grep thinks your file is binary: if there's an encoding error detected, or if it detects some NUL bytes. Both of these sound at least conceptually simple, but it turns out that grep tries to be clever about detecting NULs.


1 Answers

grep --byte-offset --only-matching --text foobar filename 

The --byte-offset option prints the offset of each matching line.

The --only-matching option makes it print offset for each matching instance instead of each matching line.

The --text option makes grep treat the binary file as a text file.

You can shorten it to:

grep -oba foobar filename 

It works in the GNU version of grep, which comes with linux by default. It won't work in BSD grep (which comes with Mac by default).

like image 62
Hari Menon Avatar answered Sep 19 '22 12:09

Hari Menon