Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk regex start of line anchor matches whitespace

Tags:

regex

bash

awk

Parsing an input file through awk I ran into an issue with anchors in awk.

Given the following file:

 2015
2015
test
 test

Output with awk

$ awk '$1 ~ /^[0-9]/' file
 2015
2015

Output with sed

$ sed -n '/^[0-9]/p' file
2015

Can somebody explain the behaviour I'm seeing in awk?

Seen with

  • CentOS 7, GNU bash 4.2.46, GNU Awk 4.0.2
  • AIX 7, GNU bash 4.3.30, awk (default version in AIX), and gawk 4.0.2
like image 755
sastorsl Avatar asked Jun 05 '15 17:06

sastorsl


2 Answers

You will understand the difference with this awk command:

awk '/^[0-9]/' file
2015

Now awk is operating on full line like sed not just the first field.

$1 ~ /^[0-9]/ only compares first field and since whitespace is default field separator in awk therefore first field is 2015 in both the lines irrespective of spaces before it.

like image 82
anubhava Avatar answered Oct 11 '22 23:10

anubhava


The problem is you are picking the first field.

You should be doing awk '/^[0-9]/' file which matches the whole line.

To be more precise:

awk '$0 ~ /^[0-9]/' file

Is what you want, as $0 is the whole line.

like image 29
bkmoney Avatar answered Oct 11 '22 22:10

bkmoney