How to print the row number and starting location of a pattern when multiple matches per row are present?

Question

I want to use awk to match all the occurrences of a pattern within a large file. For each match, I would like to print the row number and the starting position of the pattern along the row (sort of xy coordinates). There are several occurrences of the pattern in each line. I found this somewhat related question.

So far, I managed to do it only for the first (leftmost) occurrence in each line. As an example:

echo xyzABCdefghiABCdefghiABCdef | awk 'match($0, /ABC/) {print NR, RSTART } '

The resulting output is :

1 4

But what I would expect is something like this:

1 4
1 13
1 22

I tried using split instead of match. I manage to identify all the occurrences, but the RSTART is lost and printed as "0".

echo xyzABCdefghiABCdefghiABCdef | awk ' { split($0,t, /ABC/,m) ; for (i=1; i in m; i++) print (NR, RSTART) } '

Output:

1 0
1 0
1 0

Any advice would be appreciated. I am not limited to using awk but a awk solution would be appreciated. Also, in my case the pattern to match would be a regex (/A.C/). Thank you

The fourth bird · Accepted Answer

Another option using gnu awk could be using split with a regex.

Using the split function, the 3rd field is the fieldsep array and the 4th field is the seps array which you can both use to calculate the positions.

echo xyzABCdefghiABCdefghiABCdef | 
awk ' { 
  n=split($0, a, /ABC/, seps); pos=1
  for(i=1; i<n; i++){
    pos += length(a[i])
    print NR, pos
    pos += length(seps[i])
  } 
}'

Output

1 4
1 13
1 22

How to print the row number and starting location of a pattern when multiple matches per row are present?

Tags:

bash

split

awk

RicGGG

1 Answers

The fourth bird

Recent Activity

Donate For Us

How to print the row number and starting location of a pattern when multiple matches per row are present?

Tags:

bash

split

awk

RicGGG

1 Answers

The fourth bird

Related questions

Recent Activity

Donate For Us