merge/join two tables fast linux command line

Question

Let us say I have two relatively large tab-delimited files file1.txt, file2.txt.

file1.txt
id	city	car	type	model

file2.txt 
id	name	rating

Let us suppose that file1.txt has 2000 unique ids, and therefore 2000 unique rows, and file2.txt has only 1000 unique rows, and therefore 1000 unique ids. Is there a way to merge the two tables?

Case 1. merge them by id in file1.txt, where when there is no id in file2.txt NAs would be filled in.

Case2. merge them by id in file2.txt, where when only the ids in file2.txt will be printed out with the fields in file1.txt and file2.txt.

Note: the merged new files should also be tab-delimited file, with a header file as well. Note2. I'd also appreciate suggestions on how to do it when there is no header as well.

Thanks!

sehe · Accepted Answer

join -j 1 <(sort file1.txt) <(sort file2.txt)

Does your 'case 2' approach with only standard unix tools. Of course, if the files are sorted, you can drop the sort.

If you included the headers, you might rely on the ids being numerical for sorting the joined header to the top:

join -j 1 <(sort file1.txt) <(sort file2.txt) | sort -n

With

file1.txt

id  city    car type    model
1   york    subaru  impreza king
2   kampala toyota  corolla sissy
3   luzern  chrysler    gravity falcon

file2.txt

id  name    rating
3   zanzini PG
2   tara    X

output:

id  city    car type    model   name    rating
2   kampala toyota  corolla sissy   tara    X
3   luzern  chrysler    gravity falcon  zanzini PG

PS To preserve the TAB separator character, pass the -t option:

 join -t'    ' ...

It's kind of hard to show on SO that ' ' contained a TAB character. Type it with ^VTAB (e.g. in bash)

AWE · Answer

This worked for me in case 1:

join -t $' ' -1 1 -2 1 -a 1 -a 2 <(sort fileone.txt) <(sort filetwo.txt) | sort -n -t $' ' > filethree.txt

then:

awk '{if(NF+0<7) printf "%s NA NA ", $0; else print $0}' filethree.txt

merge/join two tables fast linux command line

Tags:

linux

merge

join

unix

awk

Dnaiel

2 Answers

sehe

AWE

Recent Activity

Donate For Us

merge/join two tables fast linux command line

Tags:

linux

merge

join

unix

awk

Dnaiel

2 Answers

sehe

AWE

Related questions

Recent Activity

Donate For Us