Say I have generated the following binary file:
# generate file: python -c 'import sys;[sys.stdout.write(chr(i)) for i in (0,0,0,0,2,4,6,8,0,1,3,0,5,20)]' > mydata.bin # get file size in bytes stat -c '%s' mydata.bin # 14
And say, I want to find the locations of all zeroes (0x00
), using a grep-like syntax.
The best I can do so far is:
$ hexdump -v -e "1/1 \" %02x\n\"" mydata.bin | grep -n '00' 1: 00 2: 00 3: 00 4: 00 9: 00 12: 00
However, this implicitly converts each byte in the original binary file into a multi-byte ASCII representation, on which grep
operates; not exactly the prime example of optimization :)
Is there something like a binary grep
for Linux? Possibly, also, something that would support a regular expression-like syntax, but also for byte "characters" - that is, I could write something like 'a(\x00*)b
' and match 'zero or more' occurrences of byte 0 between bytes 'a' (97) and 'b' (98)?
EDIT: The context is that I'm working on a driver, where I capture 8-bit data; something goes wrong in the data, which can be kilobytes up to megabytes, and I'd like to check for particular signatures and where they occur. (so far, I'm working with kilobyte snippets, so optimization is not that important - but if I start getting some errors in megabyte long captures, and I need to analyze those, my guess is I would like something more optimized :) . And especially, I'd like something where I can "grep" for a byte as a character - hexdump
forces me to search strings per byte)
EDIT2: same question, different forum :) grepping through a binary file for a sequence of bytes
EDIT3: Thanks to the answer by @tchrist, here is also an example with 'grepping' and matching, and displaying results (although not quite the same question as OP):
$ perl -ln0777e 'print unpack("H*",$1), "\n", pos() while /(.....\0\0\0\xCC\0\0\0.....)/g' /path/to/myfile.bin ca000000cb000000cc000000cd000000ce # Matched data (hex) 66357 # Offset (dec)
To have the matched data be grouped as one byte (two hex characters) each, then "H2 H2 H2 ..." needs to be specified for as many bytes are there in the matched string; as my match '.....\0\0\0\xCC\0\0\0.....
' covers 17 bytes, I can write '"H2"x17
' in Perl. Each of these "H2" will return a separate variable (as in a list), so join
also needs to be used to add spaces between them - eventually:
$ perl -ln0777e 'print join(" ", unpack("H2 "x17,$1)), "\n", pos() while /(.....\0\0\0\xCC\0\0\0.....)/g' /path/to/myfile.bin ca 00 00 00 cb 00 00 00 cc 00 00 00 cd 00 00 00 ce 66357
Well.. indeed Perl is very nice 'binary grepping' facility, I must admit :) As long as one learns the syntax properly :)
To force GNU grep to output lines even from files that appear to be binary, use the -a or ' --binary-files=text ' option. To eliminate the “Binary file matches” messages, use the -I or ' --binary-files=without-match ' option, or the -s or --no-messages option.
If type is ' text ', grep processes binary data as if it were text; this is equivalent to the -a option. When type is ' binary ', grep may treat non-text bytes as line terminators even without the -z ( --null-data ) option. This means choosing ' binary ' versus ' text ' can affect whether a pattern matches a file.
As this answer notes, there are two cases where grep thinks your file is binary: if there's an encoding error detected, or if it detects some NUL bytes. Both of these sound at least conceptually simple, but it turns out that grep tries to be clever about detecting NULs.
To search for a sequence of bytes, rather than a text string, select the “binary data” search type. You can then enter the bytes into the search box as you would enter them into a hex editor. PowerGREP's regular expression support works equally well with binary files as with text files.
This seems to work for me:
grep --only-matching --byte-offset --binary --text --perl-regexp "<\x-hex pattern>" <file>
Short form:
grep -obUaP "<\x-hex pattern>" <file>
Example:
grep -obUaP "\x01\x02" /bin/grep
Output (Cygwin binary):
153: <\x01\x02> 33210: <\x01\x02> 53453: <\x01\x02>
So you can grep this again to extract offsets. But don't forget to use binary mode again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With