Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary grep on Linux?

Tags:

linux

grep

binary

Say I have generated the following binary file:

# generate file: python -c 'import sys;[sys.stdout.write(chr(i)) for i in (0,0,0,0,2,4,6,8,0,1,3,0,5,20)]' > mydata.bin  # get file size in bytes stat -c '%s' mydata.bin  # 14 

And say, I want to find the locations of all zeroes (0x00), using a grep-like syntax.

 

The best I can do so far is:

$ hexdump -v -e "1/1 \" %02x\n\"" mydata.bin | grep -n '00'  1: 00 2: 00 3: 00 4: 00 9: 00 12: 00 

However, this implicitly converts each byte in the original binary file into a multi-byte ASCII representation, on which grep operates; not exactly the prime example of optimization :)

Is there something like a binary grep for Linux? Possibly, also, something that would support a regular expression-like syntax, but also for byte "characters" - that is, I could write something like 'a(\x00*)b' and match 'zero or more' occurrences of byte 0 between bytes 'a' (97) and 'b' (98)?

EDIT: The context is that I'm working on a driver, where I capture 8-bit data; something goes wrong in the data, which can be kilobytes up to megabytes, and I'd like to check for particular signatures and where they occur. (so far, I'm working with kilobyte snippets, so optimization is not that important - but if I start getting some errors in megabyte long captures, and I need to analyze those, my guess is I would like something more optimized :) . And especially, I'd like something where I can "grep" for a byte as a character - hexdump forces me to search strings per byte)

EDIT2: same question, different forum :) grepping through a binary file for a sequence of bytes

EDIT3: Thanks to the answer by @tchrist, here is also an example with 'grepping' and matching, and displaying results (although not quite the same question as OP):

$ perl -ln0777e 'print unpack("H*",$1), "\n", pos() while /(.....\0\0\0\xCC\0\0\0.....)/g' /path/to/myfile.bin  ca000000cb000000cc000000cd000000ce     # Matched data (hex) 66357                                  # Offset (dec) 

To have the matched data be grouped as one byte (two hex characters) each, then "H2 H2 H2 ..." needs to be specified for as many bytes are there in the matched string; as my match '.....\0\0\0\xCC\0\0\0.....' covers 17 bytes, I can write '"H2"x17' in Perl. Each of these "H2" will return a separate variable (as in a list), so join also needs to be used to add spaces between them - eventually:

$ perl -ln0777e 'print join(" ", unpack("H2 "x17,$1)), "\n", pos() while /(.....\0\0\0\xCC\0\0\0.....)/g' /path/to/myfile.bin  ca 00 00 00 cb 00 00 00 cc 00 00 00 cd 00 00 00 ce 66357 

Well.. indeed Perl is very nice 'binary grepping' facility, I must admit :) As long as one learns the syntax properly :)

like image 869
sdaau Avatar asked Nov 14 '10 22:11

sdaau


People also ask

How do you grep in binary?

To force GNU grep to output lines even from files that appear to be binary, use the -a or ' --binary-files=text ' option. To eliminate the “Binary file matches” messages, use the -I or ' --binary-files=without-match ' option, or the -s or --no-messages option.

Does grep work with binary files?

If type is ' text ', grep processes binary data as if it were text; this is equivalent to the -a option. When type is ' binary ', grep may treat non-text bytes as line terminators even without the -z ( --null-data ) option. This means choosing ' binary ' versus ' text ' can affect whether a pattern matches a file.

Why does grep say binary file?

As this answer notes, there are two cases where grep thinks your file is binary: if there's an encoding error detected, or if it detects some NUL bytes. Both of these sound at least conceptually simple, but it turns out that grep tries to be clever about detecting NULs.

How do I search a binary file?

To search for a sequence of bytes, rather than a text string, select the “binary data” search type. You can then enter the bytes into the search box as you would enter them into a hex editor. PowerGREP's regular expression support works equally well with binary files as with text files.


1 Answers

This seems to work for me:

grep --only-matching --byte-offset --binary --text --perl-regexp "<\x-hex pattern>" <file> 

Short form:

grep -obUaP "<\x-hex pattern>" <file> 

Example:

grep -obUaP "\x01\x02" /bin/grep 

Output (Cygwin binary):

153: <\x01\x02> 33210: <\x01\x02> 53453: <\x01\x02> 

So you can grep this again to extract offsets. But don't forget to use binary mode again.

like image 142
Fr0sT Avatar answered Sep 19 '22 00:09

Fr0sT