Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unix: merge 2 files using 2nd columns

Tags:

merge

unix

awk

I'd like to merge two files according to the content of their 2nd columns.

File 1:

"4742"  "209220_at"     2.60700394801826
"104"   "209396_s_at"   2.60651442103297
"749"   "202409_at"     2.59424724783704
"4168"  "209875_s_at"   2.58773204877464
"3973"  "1431_at"       2.52832098784342
"1826"  "207201_s_at"   2.41685345240968

File2:

"653"   "1431_at"       2.14595534191867
"1109"  "207201_s_at"   2.13777517447307
"353"   "212531_at"     2.12706340284672
"381"   "206535_at"     2.11456707231618
"1846"  "204534_at"     2.10919474441178

To have in the end:

"3973"  "1431_at"       2.52832098784342 "653"   "1431_at"       2.14595534191867
"1826"  "207201_s_at"   2.41685345240968 "1109"  "207201_s_at"   2.13777517447307

I have tried comm, diff, some obscure awk one-liner without any success. Any help much appreciated. Ben

like image 553
Benoit B. Avatar asked Feb 11 '11 16:02

Benoit B.


People also ask

How do I merge two files in Unix?

NOTE : When using join command, both the input files should be sorted on the KEY on which we are going to join the files. So, the output contains the key followed by all the matching columns from the first file file1. txt, followed by all the columns of second file file2.

How do I concatenate two columns in Unix?

paste is the command that can be used for column-wise concatenation. The paste command can be used with the following syntax: $ paste file1 file2 file3 …

How do I combine multiple files into one in Linux?

Appending content to an existing file To append content after you merge multiple files in Linux to another file, use double redirection operator. (>>) along with cat command. Rather than overwriting the contents of the file, this command appends the content at the end of the file.

Which command is used to combine multiple files in Unix?

In Unix and Unix-like operating systems (such as Linux), you can use the tar command (short for "tape archiving") to combine multiple files into a single archive file for easy storage and/or distribution.


2 Answers

You can do that with a combination of the sort and join commands. The straightforward approach is

join -j2 <(sort -k2 file1) <(sort -k2 file2)

but that displays slightly differently than you're looking for. It just shows the common join field and then the remaining fields from each file

"1431_at" "3973" 2.52832098784342 "653" 2.14595534191867
"207201_s_at" "1826" 2.41685345240968 "1109" 2.13777517447307

If you need the format exactly as you showed, then you would need to tell join to output in that manner

join -o 1.1,1.2,1.3,2.1,2.2,2.3 -j2 <(sort -k2 file1) <(sort -k2 file2)

where -o accepts a list of FILENUM.FIELDNUM specifiers.

Note that the <() syntax I'm using isn't POSIX sh, so you should sort to a temporary file if you need POSIX sh syntax.

like image 197
jamessan Avatar answered Sep 24 '22 02:09

jamessan


awk '
  # store the first file, indexed by col2
  NR==FNR {f1[$2] = $0; next}
  # output only if file1 contains file2's col2
  ($2 in f1) {print f1[$2], $0}
' file1 file2
like image 37
glenn jackman Avatar answered Sep 24 '22 02:09

glenn jackman