Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging two files by a single column in unix

Tags:

linux

merge

unix

I would like to merge two files by one column in unix.

I have file_a:

subjectid name age  
12 Jane 16  
24 Kristen 90  
15 Clarke 78  
23 Joann 31  

I have another file_b:

subjectid prob_disease  
12 0.009  
24 0.738  
15 0.392  
23 1.2E-5  

I would like to merge these files in the command line. I'd like to merge files a and b by subjectid. Since each file is about 2 million lines long, I tried in R but it froze due to the amount of data, could someone please help me do this in linux? Desired output:

subjectid prob_disease name age  
12 0.009 Jane 16  
24 0.738 Kristen 90   
15 0.392 Clarke 78  
23 1.2E-5 Joanna 31     

Please help and thank you!

like image 447
CadisEtRama Avatar asked Mar 05 '12 23:03

CadisEtRama


People also ask

How do I merge two files in a column in Unix?

NOTE : When using join command, both the input files should be sorted on the KEY on which we are going to join the files. So, the output contains the key followed by all the matching columns from the first file file1. txt, followed by all the columns of second file file2.

How do I merge two files horizontally in Unix?

paste is a Unix command line utility which is used to join files horizontally (parallel merging) by outputting lines consisting of the sequentially corresponding lines of each file specified, separated by tabs, to the standard output.

How do I merge two files together?

Two quick options for combining text files.Open the two files you want to merge. Select all text (Command+A/Ctrl+A) from one document, then paste it into the new document (Command+V/Ctrl+V). Repeat steps for the second document. This will finish combining the text of both documents into one.

How do I append two files in Unix?

You do this by using the append redirection symbol, ``>>''. To append one file to the end of another, type cat, the file you want to append, then >>, then the file you want to append to, and press <Enter>.


2 Answers

Check out join(1). In your case, you don't even need any flags:

$ join file_b file_a
subjectid prob_disease name age
12 0.009 Jane 16
24 0.738 Kristen 90
15 0.392 Clarke 78
23 1.2E-5 Joann 31
like image 160
Carl Norum Avatar answered Oct 22 '22 10:10

Carl Norum


You're looking for the join command:

$ cat test.1
12 Jane 16
24 Kristen 90
15 Clarke 78
23 Joann 31 
$ cat test.2
12 0.009
24 0.738
15 0.392
23 1.2E-5 
$ join -j1 -o 2.1,2.2,1.2,1.3  <(sort test.1) <(sort test.2)
12 0.009 Jane 16
15 0.392 Clarke 78
23 1.2E-5 Joann 31
24 0.738 Kristen 90
$ 
like image 35
Kevin Avatar answered Oct 22 '22 11:10

Kevin