Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep through colored text , e.g. gcc | colorgcc | grep regexp

Tags:

regex

grep

perl

How do I make grep respect ANSI color escapes when grepping piped output ? I am happy to use something else (perl?) instead of grep.

My usercase: I want

 gcc foobar.c | colorgcc | grep regexp
 ls --color | grep filename

work nicely with colors (on a unix terminal using ANSI escapes).

The test examples of behaviour I want :

echo -e "he\e[35mllo\e[00m" world |grep hell ==> he\e[35mllo\e[00m world 
echo -e "\e[35m removed line\nhello\e[00m" world |grep hell ==> \e[35mhello\e[00m world
echo -e "\e[35m rem\e[1moved line\nhello\e[00m" world | grep hell ==> \e35m\e1mhello\e[00m world

Currently the first line gives the empty string, and the second one gives uncolorised string 'hello\e[00m world'. Here \e[35m and \e00m are color (attribute) modifierds: the colour of a letter is determined by the last few color (attribute) escape sequences of form \e[P1; P2; .. m where P1,P2, etc are sequence of digits; \e[P1m\e[P2m is equivalent to \e[P1;P2m . \e[0m makes the color default and forgets all the previous \e[..m sequences: \e[34m\e[0m is equivalent to \e[0m. There are several independent attributes (boldness, colour of the background, colour of foreground/letter); each number in an escape sequence affects only one of them. Thus \e[1m\e[35m is equivalent to \e[1;35m but not \e[35;1m nor \e[35m ; however, \e[34m\e[35m are equivalent to \e[35m because they both affect the same attirbute (namely, color of the letter/foregrnound).

like image 925
John Quilder Avatar asked Feb 06 '13 21:02

John Quilder


1 Answers

This is a really interesting problem, here is what I came up with. It is pretty ugly but it seems to get the job done:

sed -n '1s/^/\x1b[0m/;H;x;s/\n//;p;s/.*\(\x1b\[[0-9]*m\(;[0-9]*m\)*\).*/\1/;h' |
  grep `sed 's/./\0\\\\(\x1b\\\\[[0-9]*m\\\\(;[0-9]*m\\\\)*\\\\)*/g' <<< hell`

The term you are searching for would be placed at the very end (in place of "hell"), here are a few examples with the text you provided (using hexdump to show colors):

$ echo -e "he\e[35mllo\e[00m" world |
> sed -n '1s/^/\x1b[0m/;H;x;s/\n//;p;s/.*\(\x1b\[[0-9]*m\(;[0-9]*m\)*\).*/\1/;h' |
> grep `sed 's/./\0\\\\(\x1b\\\\[[0-9]*m\\\\(;[0-9]*m\\\\)*\\\\)*/g' <<< hell` |
> hexdump -C
00000000  1b 5b 30 6d 68 65 1b 5b  33 35 6d 6c 6c 6f 1b 5b  |.[0mhe.[35mllo.[|
00000010  30 30 6d 20 77 6f 72 6c  64 0a                    |00m world.|
0000001a

$ echo -e "\e[35m removed line\nhello\e[00m" world |
> sed -n '1s/^/\x1b[0m/;H;x;s/\n//;p;s/.*\(\x1b\[[0-9]*m\(;[0-9]*m\)*\).*/\1/;h' |
> grep `sed 's/./\0\\\\(\x1b\\\\[[0-9]*m\\\\(;[0-9]*m\\\\)*\\\\)*/g' <<< hell` |
> hexdump -C
00000000  1b 5b 33 35 6d 68 65 6c  6c 6f 1b 5b 30 30 6d 20  |.[35mhello.[00m |
00000010  77 6f 72 6c 64 0a                                 |world.|
00000016

The first sed command prepends the current color setting to the beginning of each line, which is necessary for your second example where the color is set on a line that grep will skip. The sed command that is the argument to grep inserts a regex that will match any number of color escapes between each character in the search term.

Here is the egrep version:

sed -n '1s/^/\x1b[0m/;H;x;s/\n//;p;s/.*\(\x1b\[[0-9]*m\(;[0-9]*m\)*\).*/\1/;h' |
  egrep `sed 's/./\0(\x1b\\\\[[0-9]*m(;[0-9]*m)*)*/g' <<< hell`
like image 170
Andrew Clark Avatar answered Oct 19 '22 19:10

Andrew Clark