Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWK: Access captured group from line pattern

Tags:

regex

awk

With gawk, you can use the match function to capture parenthesized groups.

gawk 'match($0, pattern, ary) {print ary[1]}' 

example:

echo "abcdef" | gawk 'match($0, /b(.*)e/, a) {print a[1]}' 

outputs cd.

Note the specific use of gawk which implements the feature in question.

For a portable alternative you can achieve similar results with match() and substr.

example:

echo "abcdef" | awk 'match($0, /b[^e]*/) {print substr($0, RSTART+1, RLENGTH-1)}'

outputs cd.


That was a stroll down memory lane...

I replaced awk by perl a long time ago.

Apparently the AWK regular expression engine does not capture its groups.

you might consider using something like :

perl -n -e'/test(\d+)/ && print $1'

the -n flag causes perl to loop over every line like awk does.


This is something I need all the time so I created a bash function for it. It's based on glenn jackman's answer.

Definition

Add this to your .bash_profile etc.

function regex { gawk 'match($0,/'$1'/, ary) {print ary['${2:-'0'}']}'; }

Usage

Capture regex for each line in file

$ cat filename | regex '.*'

Capture 1st regex capture group for each line in file

$ cat filename | regex '(.*)' 1

You can use GNU awk:

$ cat hta
RewriteCond %{HTTP_HOST} !^www\.mysite\.net$
RewriteRule (.*) http://www.mysite.net/$1 [R=301,L]

$ gawk 'match($0, /.*(http.*?)\$/, m) { print m[1]; }' < hta
http://www.mysite.net/