I'd like to merge two files according to the content of their 2nd columns.
File 1:
"4742" "209220_at" 2.60700394801826
"104" "209396_s_at" 2.60651442103297
"749" "202409_at" 2.59424724783704
"4168" "209875_s_at" 2.58773204877464
"3973" "1431_at" 2.52832098784342
"1826" "207201_s_at" 2.41685345240968
File2:
"653" "1431_at" 2.14595534191867
"1109" "207201_s_at" 2.13777517447307
"353" "212531_at" 2.12706340284672
"381" "206535_at" 2.11456707231618
"1846" "204534_at" 2.10919474441178
To have in the end:
"3973" "1431_at" 2.52832098784342 "653" "1431_at" 2.14595534191867
"1826" "207201_s_at" 2.41685345240968 "1109" "207201_s_at" 2.13777517447307
I have tried comm
, diff
, some obscure awk
one-liner without any success.
Any help much appreciated.
Ben
NOTE : When using join command, both the input files should be sorted on the KEY on which we are going to join the files. So, the output contains the key followed by all the matching columns from the first file file1. txt, followed by all the columns of second file file2.
paste is the command that can be used for column-wise concatenation. The paste command can be used with the following syntax: $ paste file1 file2 file3 …
Appending content to an existing file To append content after you merge multiple files in Linux to another file, use double redirection operator. (>>) along with cat command. Rather than overwriting the contents of the file, this command appends the content at the end of the file.
In Unix and Unix-like operating systems (such as Linux), you can use the tar command (short for "tape archiving") to combine multiple files into a single archive file for easy storage and/or distribution.
You can do that with a combination of the sort
and join
commands. The straightforward approach is
join -j2 <(sort -k2 file1) <(sort -k2 file2)
but that displays slightly differently than you're looking for. It just shows the common join field and then the remaining fields from each file
"1431_at" "3973" 2.52832098784342 "653" 2.14595534191867
"207201_s_at" "1826" 2.41685345240968 "1109" 2.13777517447307
If you need the format exactly as you showed, then you would need to tell join
to output in that manner
join -o 1.1,1.2,1.3,2.1,2.2,2.3 -j2 <(sort -k2 file1) <(sort -k2 file2)
where -o
accepts a list of FILENUM.FIELDNUM
specifiers.
Note that the <()
syntax I'm using isn't POSIX sh, so you should sort to a temporary file if you need POSIX sh syntax.
awk '
# store the first file, indexed by col2
NR==FNR {f1[$2] = $0; next}
# output only if file1 contains file2's col2
($2 in f1) {print f1[$2], $0}
' file1 file2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With