Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print matched regex pattern using awk?

Tags:

regex

awk

People also ask

Can I use regex with awk?

Use regex to search code using dynamic and complex pattern definitions. In awk, regular expressions (regex) allow for dynamic and complex pattern definitions. You're not limited to searching for simple strings but also patterns within patterns.

What is pattern matching in awk?

Any awk expression is valid as an awk pattern. The pattern matches if the expression's value is nonzero (if a number) or non-null (if a string). The expression is reevaluated each time the rule is tested against a new input record.

How do I match a pattern in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).


This is the very basic

awk '/pattern/{ print $0 }' file

ask awk to search for pattern using //, then print out the line, which by default is called a record, denoted by $0. At least read up the documentation.

If you only want to get print out the matched word.

awk '{for(i=1;i<=NF;i++){ if($i=="yyy"){print $i} } }' file

It sounds like you are trying to emulate GNU's grep -o behaviour. This will do that providing you only want the first match on each line:

awk 'match($0, /regex/) {
    print substr($0, RSTART, RLENGTH)
}
' file

Here's an example, using GNU's awk implementation (gawk):

awk 'match($0, /a.t/) {
    print substr($0, RSTART, RLENGTH)
}
' /usr/share/dict/words | head
act
act
act
act
aft
ant
apt
art
art
art

Read about match, substr, RSTART and RLENGTH in the awk manual.

After that you may wish to extend this to deal with multiple matches on the same line.


gawk can get the matching part of every line using this as action:

{ if (match($0,/your regexp/,m)) print m[0] }

match(string, regexp [, array]) If array is present, it is cleared, and then the zeroth element of array is set to the entire portion of string matched by regexp. If regexp contains parentheses, the integer-indexed elements of array are set to contain the portion of string matching the corresponding parenthesized subexpression. http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions


If Perl is an option, you can try this:

perl -lne 'print $1 if /(regex)/' file

To implement case-insensitive matching, add the i modifier

perl -lne 'print $1 if /(regex)/i' file

To print everything AFTER the match:

perl -lne 'if ($found){print} else{if (/regex(.*)/){print $1; $found++}}' textfile

To print the match and everything after the match:

perl -lne 'if ($found){print} else{if (/(regex.*)/){print $1; $found++}}' textfile

If you are only interested in the last line of input and you expect to find only one match (for example a part of the summary line of a shell command), you can also try this very compact code, adopted from How to print regexp matches using `awk`?:

$ echo "xxx yyy zzz" | awk '{match($0,"yyy",a)}END{print a[0]}'
yyy

Or the more complex version with a partial result:

$ echo "xxx=a yyy=b zzz=c" | awk '{match($0,"yyy=([^ ]+)",a)}END{print a[1]}'
b

Warning: the awk match() function with three arguments only exists in gawk, not in mawk

Here is another nice solution using a lookbehind regex in grep instead of awk. This solution has lower requirements to your installation:

$ echo "xxx=a yyy=b zzz=c" | grep -Po '(?<=yyy=)[^ ]+'
b