I would like to merge multiple tables by row names. The tables differ in the amount of rows and they have unique and shared rows, which should all appear in output. If possible I would like to solve the problem with awk
, but I am also fine with other solutions.
table1.tab
a 5
b 5
d 9
table2.tab
a 1
b 2
c 8
e 11
The output I would like to obtain the following table:
table3.tab
a 5 1
b 5 2
d 9 0
c 0 8
e 0 11
I tried using join
join table1.tab table2.tab > table3.tab
but I get
table3.tab
a 5 1
b 5 2
row c
, d
and e
are not in the output.
Using an Incomplete ON Condition Unwanted rows in the result set may come from incomplete ON conditions. In some cases, you need to join tables by multiple columns. In these situations, if you use only one pair of columns, it results in duplicate rows.
Solution. Select column values in a specific order within rows to make rows with duplicate sets of values identical. Then you can use SELECT DISTINCT to remove duplicates. Alternatively, retrieve rows in such a way that near-duplicates are not even selected.
The answer is yes, if there are any. If there are duplicate keys in the tables being joined.
Check for Duplicates in Multiple Tables With INNER JOINUse the INNER JOIN function to find duplicates that exist in multiple tables. Sample syntax for an INNER JOIN function looks like this: SELECT column_name FROM table1 INNER JOIN table2 ON table1. column_name = table2.
You want to do a full outer join:
join -a1 -a2 -o 0 1.2 2.2 -e "0" table1.tab table2.tab
a 5 1
b 5 2
c 0 8
d 9 0
e 0 11
this awk oneliner should work for your example:
awk 'NR==FNR{a[$1]=$2;k[$1];next}{b[$1]=$2;k[$1]}
END{for(x in k)printf"%s %d %d\n",x,a[x],b[x]}' table1 table2
test
kent$ head f1 f2
==> f1 <==
a 5
b 5
d 9
==> f2 <==
a 1
b 2
c 8
e 11
kent$ awk 'NR==FNR{a[$1]=$2;k[$1];next}{b[$1]=$2;k[$1]}END{for(x in k)printf"%s %d %d\n",x,a[x],b[x]}' f1 f2
a 5 1
b 5 2
c 0 8
d 9 0
e 0 11
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With