Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep matching specific position in lines using words from other file

Tags:

grep

shell

unix

I have 2 file

file1:

12342015010198765hello
12342015010188765hello
12342015010178765hello

whose each line contains fields at fixed positions, for example, position 13 - 17 is for account_id

file2:

98765
88765

which contains a list of account_ids.

In Korn Shell, I want to print lines from file1 whose position 13 - 17 match one of account_id in file2.

I can't do

grep -f file2 file1

because account_id in file2 can match other fields at other positions.

I have tried using pattern in file2:

^.{12}98765.*

but did not work.

like image 811
asinkxcoswt Avatar asked Jul 10 '15 04:07

asinkxcoswt


2 Answers

Using awk

$ awk 'NR==FNR{a[$1]=1;next;} substr($0,13,5) in a' file2 file1
12342015010198765hello
12342015010188765hello

How it works

  • NR==FNR{a[$1]=1;next;}

    FNR is the number of lines read so far from the current file and NR is the total number of lines read so far. Thus, if FNR==NR, we are reading the first file which is file2.

    Each ID in in file2 is saved in array a. Then, we skip the rest of the commands and jump to the next line.

  • substr($0,13,5) in a

    If we reach this command, we are working on the second file, file1.

    This condition is true if the 5 character long substring that starts at position 13 is in array a. If the condition is true, then awk performs the default action which is to print the line.

Using grep

You mentioned trying

grep '^.{12}98765.*' file2

That uses extended regex syntax which means that -E is required. Also, there is no value in matching .* at the end: it will always match. Thus, try:

$ grep -E '^.{12}98765' file1
12342015010198765hello

To get both lines:

$ grep -E '^.{12}[89]8765' file1
12342015010198765hello
12342015010188765hello

This works because [89]8765 just happens to match the IDs of interest in file2. The awk solution, of course, provides more flexibility in what IDs to match.

like image 157
John1024 Avatar answered Dec 05 '22 17:12

John1024


Using sed with extended regex:

sed -r 's@.*@/^.{12}&/p@' file2 |sed -nr -f- file1

Using Basic regex:

sed 's@.*@/^.\\{12\\}&/p@' file1 |sed -n -f- file

Explanation:

sed -r 's@.*@/^.{12}&/p@' file2

will generate an output:

/.{12}98765/p
/.{12}88765/p

which is then used as a sed script for the next sed after pipe, which outputs:

12342015010198765hello
12342015010188765hello
like image 29
Jahid Avatar answered Dec 05 '22 18:12

Jahid