Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make grep separate output by NULL characters?

Suppose we are doing a multiline regex pattern search on a bunch of files and we want to extract the matches from grep. By default, grep outputs matches separated by newlines, but since we are doing multiline patterns this creates the inconvenience that we cannot easily extract the individual matches.

Example

grep -rzPIho '}\n\n\w\w\b' | od -a

Depending on the files in your filetree, this may yield an output like

0000000   }  nl  nl   m   y  nl   }  nl  nl   i   f  nl   }  nl  nl   m
0000020   y  nl   }  nl  nl   m   y  nl   }  nl  nl   i   f  nl   }  nl
0000040  nl   m   y  nl
0000044

As you can see, we cannot split on newlines to obtain the matches for further processing, since the matches contain newline characters themselves.

What doesn't work

Now the --null (or -Z) only works in conjunction with -l, which makes grep only list filenames instead of matches, so that doesn't help here.

Note, this is not a duplicate of Is there a grep equivalent for find's -print0 and xargs's -0 switches?, because the requirements in that question are different, allowing it to be answered using alternative techniques.

So, how can we make this work? Maybe use grep in conjuction with other tools?

like image 715
chtenb Avatar asked Mar 12 '23 21:03

chtenb


2 Answers

So I filed this issue as a feature request in the GNU grep bug mailing list, and it appeared to be a bug in the code.

It has been fixed and pushed to master, so it will be available in the next release of GNU grep: http://git.savannah.gnu.org/cgit/grep.git/commit/?id=cce2fd5520bba35cf9b264de2f1b6131304f19d2

To summarize: this patch makes sure that the -z flag not only works in conjunction with -l, but also with -o.

like image 147
chtenb Avatar answered Mar 15 '23 10:03

chtenb


What comes into my mind would be to use a group separator, for example something like:

grep -rzPIho '}\n\n\w\w\b' $FILE -H | sed "s/^$FILE:/\x0/"
like image 38
bufh Avatar answered Mar 15 '23 09:03

bufh