awk regex start of line anchor matches whitespace

Question

Parsing an input file through awk I ran into an issue with anchors in awk.

Given the following file:

 2015
2015
test
 test

Output with awk

$ awk '$1 ~ /^[0-9]/' file
 2015
2015

Output with sed

$ sed -n '/^[0-9]/p' file
2015

Can somebody explain the behaviour I'm seeing in awk?

Seen with

CentOS 7, GNU bash 4.2.46, GNU Awk 4.0.2
AIX 7, GNU bash 4.3.30, awk (default version in AIX), and gawk 4.0.2

anubhava · Accepted Answer

You will understand the difference with this awk command:

awk '/^[0-9]/' file
2015

Now awk is operating on full line like sed not just the first field.

$1 ~ /^[0-9]/ only compares first field and since whitespace is default field separator in awk therefore first field is 2015 in both the lines irrespective of spaces before it.

bkmoney · Answer

The problem is you are picking the first field.

You should be doing awk '/^[0-9]/' file which matches the whole line.

To be more precise:

awk '$0 ~ /^[0-9]/' file

Is what you want, as $0 is the whole line.

awk regex start of line anchor matches whitespace

Tags:

regex

bash

awk

sastorsl

2 Answers

anubhava

bkmoney

Recent Activity

Donate For Us

awk regex start of line anchor matches whitespace

Tags:

regex

bash

awk

sastorsl

2 Answers

anubhava

bkmoney

Related questions

Recent Activity

Donate For Us