Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join multiple tables by row names [duplicate]

Tags:

bash

shell

join

awk

I would like to merge multiple tables by row names. The tables differ in the amount of rows and they have unique and shared rows, which should all appear in output. If possible I would like to solve the problem with awk, but I am also fine with other solutions.

table1.tab

a 5
b 5
d 9

table2.tab

a 1
b 2
c 8
e 11

The output I would like to obtain the following table:

table3.tab

a 5 1
b 5 2
d 9 0
c 0 8
e 0 11

I tried using join

join table1.tab table2.tab > table3.tab

but I get

table3.tab

a 5 1
b 5 2

row c, d and e are not in the output.

like image 203
user2715173 Avatar asked Aug 25 '13 09:08

user2715173


People also ask

Why do multiple table joins produce duplicate rows?

Using an Incomplete ON Condition Unwanted rows in the result set may come from incomplete ON conditions. In some cases, you need to join tables by multiple columns. In these situations, if you use only one pair of columns, it results in duplicate rows.

How do I prevent duplicate rows from joining multiple tables?

Solution. Select column values in a specific order within rows to make rows with duplicate sets of values identical. Then you can use SELECT DISTINCT to remove duplicates. Alternatively, retrieve rows in such a way that near-duplicates are not even selected.

Does inner join allow duplicate rows?

The answer is yes, if there are any. If there are duplicate keys in the tables being joined.

How do you find duplicates using join?

Check for Duplicates in Multiple Tables With INNER JOINUse the INNER JOIN function to find duplicates that exist in multiple tables. Sample syntax for an INNER JOIN function looks like this: SELECT column_name FROM table1 INNER JOIN table2 ON table1. column_name = table2.


2 Answers

You want to do a full outer join:

join -a1 -a2 -o 0 1.2 2.2 -e "0" table1.tab table2.tab

a 5 1
b 5 2
c 0 8
d 9 0
e 0 11
like image 147
Clayton Stanley Avatar answered Oct 04 '22 02:10

Clayton Stanley


this awk oneliner should work for your example:

awk 'NR==FNR{a[$1]=$2;k[$1];next}{b[$1]=$2;k[$1]}
END{for(x in k)printf"%s %d %d\n",x,a[x],b[x]}' table1 table2

test

kent$  head f1 f2
==> f1 <==
a 5
b 5
d 9

==> f2 <==
a 1
b 2
c 8
e 11

kent$  awk 'NR==FNR{a[$1]=$2;k[$1];next}{b[$1]=$2;k[$1]}END{for(x in k)printf"%s %d %d\n",x,a[x],b[x]}'  f1 f2
a 5 1
b 5 2
c 0 8
d 9 0
e 0 11
like image 37
Kent Avatar answered Oct 04 '22 02:10

Kent