Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk search column from one file, if match print columns from both files

I'm trying to compare column 1 from file1 and column 3 from file 2, if they match then print the first column from file1 and the two first columns from file2.

here's a sample from each file:

file1

Cre01.g000100   
Cre01.g000500  
Cre01.g000650  

file2

chromosome_1    71569  |655|Cre01.g000500|protein_coding|CODING|PAC:26902937|1|1)
chromosome_1    93952  |765|Cre01.g000650|protein_coding|CODING|PAC:26903448|11|1)
chromosome_1    99034  |1027|Cre01.g000100 |protein_coding|CODING|PAC:26903318|9|1)

desired output

Cre01.g000100  chromosome_1    99034        
Cre01.g000500  chromosome_1    71569   
Cre01.g000650  chromosome_1    93952

I've been looking at various threads that are somewhat similar, but I can't seem to get it to print the columns from both files. Here are some links that are somewhat related:

awk compare 2 files, 2 fields different order in the file, print or merge match and non match lines

Obtain patterns from a file, compare to a column of another file, print matching lines, using awk

awk compare columns from two files, impute values of another column

Obtain patterns in one file from another using ack or awk or better way than grep?

Awk - combine the data from 2 files and print to 3rd file if keys matched

I feel like I should have been able to figure it out based on these threads, but it's been two days that I've been trying different variations of the codes and I haven't gotten anywhere. Here is some code that I've tried using on my files:

awk 'FNR==NR{a[$3]=$1;next;}{print $0 ($3 in a ? a[$3]:"NA")}' file1 file2

awk 'NR==FNR{ a[$1]; next} ($3 in a) {print $1 $2 a[$1]}' file1 file2

awk 'FNR==NR{a[$1]=$0; next}{print a[$1] $0}' file1 file2

I know i have to create a temp matrix that contains the first column of file1 (or the 3rd column of file2) then compare it to the other file. If there is a match, then print first column from file1 and column 1 and column 2 from file 2.

Thanks for the help!

like image 482
Luke Anderson- Trocme Avatar asked Oct 31 '22 00:10

Luke Anderson- Trocme


1 Answers

You can use this awk:

awk -F '[| ]+' -v OFS='\t' 'NR==FNR{a[$4]=$1 OFS $2; next}
       $1 in a{print $1, a[$1]}' file2 file1
Cre01.g000100   chromosome_1    99034
Cre01.g000500   chromosome_1    71569
Cre01.g000650   chromosome_1    93952
like image 172
anubhava Avatar answered Nov 11 '22 18:11

anubhava