Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk extract multiple groups from each line

How do I perform action on all matching groups when the pattern matches multiple times in a line?

To illustrate, I want to search for /Hello! (\d+)/ and use the numbers, for example, print them out or sum them, so for input

abcHello! 200 300 Hello! Hello! 400z3
ads
Hello! 0

If I decided to print them out, I'd expect the output of

200
400
0
like image 926
Adrian Panasiuk Avatar asked Jul 12 '09 15:07

Adrian Panasiuk


3 Answers

This is a simple syntax, and every awk (nawk, mawk, gawk, etc) can use this.

{
    while (match($0, /Hello! [0-9]+/)) {
        pattern = substr($0, RSTART, RLENGTH);
        sub(/Hello! /, "", pattern);
        print pattern;
        $0 = substr($0, RSTART + RLENGTH);
    }
}
like image 179
Hirofumi Saito Avatar answered Nov 06 '22 20:11

Hirofumi Saito


This is gawk syntax. It also works for patterns when there's no fixed text that can work as a record separator and doesn't match over linefeeds:

 {
     pattern = "([a-g]+|[h-z]+)"
     while (match($0, pattern, arr))
     {
         val = arr[1]
         print val
         sub(pattern, "")
     }
 }
like image 45
Adrian Panasiuk Avatar answered Nov 06 '22 21:11

Adrian Panasiuk


GNU awk

awk 'BEGIN{ RS="Hello! ";}
{
    gsub(/[^0-9].*/,"",$1)
    if ($1 != ""){ 
        print $1 
    }
}' file
like image 1
ghostdog74 Avatar answered Nov 06 '22 20:11

ghostdog74