Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join two files using awk

Tags:

file

join

awk

I have two files like shown below which are tab-delimited:

file A

chr1   123 aa b c d
chr1   234 a  b c d
chr1   345 aa b c d
chr1   456 a  b c d
....

file B

xxxx  abcd    chr1   123    aa    c    d    e
yyyy  defg    chr1   345    aa    e    f    g
...

I want to join the two files based on 3 columns with "chr1", "123" and "aa" and add first two columns from file B to file A, such that output looks as shown below: output:

chr1   123    aa    b    c    d    xxxx    abcd
chr1   234    a     b    c    d
chr1   345    aa    b    c    d    yyyy    defg
chr1   456    a    b    c    d

Could anyone help to do this in awk. If possible using awk oneliners?

like image 297
chas Avatar asked Nov 06 '12 20:11

chas


People also ask

How do I merge two files together?

Open the two files you want to merge. Select all text (Command+A/Ctrl+A) from one document, then paste it into the new document (Command+V/Ctrl+V). Repeat steps for the second document. This will finish combining the text of both documents into one.

How do I combine multiple files into one in Linux?

Appending content to an existing file To append content after you merge multiple files in Linux to another file, use double redirection operator. (>>) along with cat command. Rather than overwriting the contents of the file, this command appends the content at the end of the file.

What is NR and FNR in awk?

NR and FNR are two built-in awk variables. NR tells us the total number of records that we've read so far, while FNR gives us the number of records we've read in the current input file.


1 Answers

Here is one approach using awk:

$ awk 'NR==FNR{a[$3,$4]=$1OFS$2;next}{$6=a[$1,$2];print}' OFS='\t' fileb filea
chr1    123     a    b    c     xxxx    abcd
chr1    234     a    b    c 
chr1    345     a    b    c     yyyy    defg
chr1    456     a    b    c 

Explanation:

NR==FNR             # current recond num match the file record num i.e in filea
a[$3,$4]=$1OFS$2    # Create entry in array with fields 3 and 4 as the key
next                # Grab the next line (don't process the next block)
$6=a[$1,$2]         # Assign the looked up value to field 6 (+rebuild records)  
print               # Print the current line & the matching entry from fileb ($6)

OFS='\t'            # Seperate each field with a single TAB on output

Edit:

For the 3 field problem you simple add the extra field:

$ awk 'NR==FNR{a[$3,$4,$5]=$1OFS$2;next}{$6=a[$1,$2,$3];print}' OFS='\t' fileb filea
chr1    123    aa     b      c     xxxx     abcd
chr1    234    a      b      c  
chr1    345    aa     b      c     yyyy     defg
chr1    456    a      b      c 
like image 177
Chris Seymour Avatar answered Sep 28 '22 04:09

Chris Seymour