Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to grep a text file which contains some binary data?

Tags:

shell

People also ask

Can you grep a binary file?

By default, TYPE is binary, and grep normally outputs either a one-line message saying that a binary file matches, or no message if there is no match. If TYPE is without-match, grep assumes that a binary file does not match; this is equivalent to the -I option.

Why does grep think my text file is binary?

As this answer notes, there are two cases where grep thinks your file is binary: if there's an encoding error detected, or if it detects some NUL bytes. Both of these sound at least conceptually simple, but it turns out that grep tries to be clever about detecting NULs.

How do I search a binary file?

To search for a sequence of bytes, rather than a text string, select the “binary data” search type. You can then enter the bytes into the search box as you would enter them into a hex editor. PowerGREP's regular expression support works equally well with binary files as with text files.


grep -a

It can't get simpler than that.


One way is to simply treat binary files as text anyway, with grep --text but this may well result in binary information being sent to your terminal. That's not really a good idea if you're running a terminal that interprets the output stream (such as VT/DEC or many others).

Alternatively, you can send your file through tr with the following command:

tr '[\000-\011\013-\037\177-\377]' '.' <test.log | grep whatever

This will change anything less than a space character (except newline) and anything greater than 126, into a . character, leaving only the printables.


If you want every "illegal" character replaced by a different one, you can use something like the following C program, a classic standard input filter:

#include<stdio.h>
int main (void) {
    int ch;
    while ((ch = getchar()) != EOF) {
        if ((ch == '\n') || ((ch >= ' ') && (ch <= '~'))) {
            putchar (ch);
        } else {
            printf ("{{%02x}}", ch);
        }
    }
    return 0;
}

This will give you {{NN}}, where NN is the hex code for the character. You can simply adjust the printf for whatever style of output you want.

You can see that program in action here, where it:

pax$ printf 'Hello,\tBob\nGoodbye, Bob\n' | ./filterProg
Hello,{{09}}Bob
Goodbye, Bob

You could run the data file through cat -v, e.g

$ cat -v tmp/test.log | grep re
line1 re ^@^M
line3 re^M

which could be then further post-processed to remove the junk; this is most analogous to your query about using tr for the task.

-v simply tells cat to display non-printing characters.


You can use "strings" to extract strings from a binary file, for example

strings binary.file | grep foo