Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash: join by numeric column

If I want to use join on my Ubuntu, I need to first sort both files lexicographically (according to join --help), and only then join them:

tail -n +2 meta/201508_1 | sort -k 1b,1 > meta.txt
tail -n +2 keywords/copy | sort -k 1b,1 > keywords.txt
join meta.txt keywords.txt -1 1 -2 1 -t $'\t' 

(I also remove the header from both of them using tail)

But instead of sorting files lexicographically, I would like to sort them numerically: the first column in both files is an ID.

tail -n +2 meta/201508_1 | sort -k1 -n > meta.txt
tail -n +2 keywords/copy.txt | sort -k1 -n > keywords.txt

And then join. But for join these files look unsorted:

join: meta.txt:10: is not sorted: 1023  301000  en
join: keywords.txt:2: is not sorted: 10 keyword1

If I add --nocheck-order to join, it doesn't join properly - it outputs just one line.

How do I join two files on their numerical ID in bash?

Sample (columns are tab-separated):

file 1

id  volume lang
1   10  en
2   20  en
5   30  en
6   40  en
10  50  en

file 2

id  keyword
4   kw1
2   kw2
10  kw3
1   kw4
3   kw5

desired output

1   kw4 10  en
2   kw2 20  en
10  kw3 50  en
like image 661
Alexey Grigorev Avatar asked Mar 24 '26 16:03

Alexey Grigorev


1 Answers

Both of these work. The first one (sort -b is recommended on the Mac)

join <(sed 1d file1 | sort -b) <(sed 1d file2 | sort -b) | sort -n

the Linux man page recommends sort -k 1b,1

join <(sed 1d file1 | sort -k 1b,1) <(sed 1d file2 | sort -k 1b,1) | sort -n

In any case, you need to sort them lexicographically to join them. At the end you can still sort the result numerically.

like image 67
Johannes Weiss Avatar answered Mar 27 '26 06:03

Johannes Weiss



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!