Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

join 3 files by first Column with join (was awk)?

i have three similar files, they are all like this:

File A

ID1 Value1a
ID2 Value2a
  .
  .
  .
IDN Value2n

and i want an output like this

Output

ID1 Value1a Value1b Value1c
ID2 Value2a Value2b Value2c
.....
IDN ValueNa ValueNb ValueNc

Looking to the first line, i want value1A to be the value of id1 in fileA, value1B the value of id1 in fileB, and so on which each field and each line. I thougth it like a sql join. I've tried several things but none of them where even close.

EDIT: All files have the same length and ids.

like image 993
msemelman Avatar asked May 12 '10 04:05

msemelman


3 Answers

Give join(1) a try:

join fileA fileB | join - fileC
like image 192
Dennis Williamson Avatar answered Nov 15 '22 09:11

Dennis Williamson


join (Dennis's answer) is better, but just for kicks, here's what I came up with in awk:

awk '{a=$0; getline b <"fileB"; getline c <"fileC"; $0=a" "b" "c; print $1,$2,$4,$6}' <fileA
like image 32
David Z Avatar answered Nov 15 '22 09:11

David Z


Update: The question has been edited to state that all files contain all keys, so the accepted answer (join) is definitely better than this one. Only consider using this one if it's possible the keys may not be in all files.


If you're not too concerned about performance, you could try the quick and dirty:

$ cat file_a
    ID5 Value5a
    ID1 Value1a
    ID3 Value3a
    ID4 Value4a
    ID2 Value2a
$ cat file_b
    ID1 Value1b
    ID3 Value3b
$ cat file_c
    ID2 Value2c
    ID3 Value3c
    ID4 Value4c
    ID5 Value5c
$ cat qq.sh
    #!/bin/bash
    keylist=$(awk '{print $1'} file_[abc] | sort | uniq)
    for key in ${keylist} ; do
        val_a=$(grep "^${key} " file_a | awk '{print $2}') ; val_a=${val_a:--}
        val_b=$(grep "^${key} " file_b | awk '{print $2}') ; val_b=${val_b:--}
        val_c=$(grep "^${key} " file_c | awk '{print $2}') ; val_c=${val_c:--}
        echo ${key} ${val_a} ${val_b} ${val_c}
    done
$ ./qq.sh
    ID1 Value1a Value1b -
    ID2 Value2a - Value2c
    ID3 Value3a Value3b Value3c
    ID4 Value4a - Value4c
    ID5 Value5a - Value5c

This actually works out the keys first then gets the values from each file with that key, or - if it's not in the relevant file.

The grep commands will need to be adjusted if the file is more complex (either if field 1 isn't at the start of the line or is followed by a non-space separator) but this should be a reasonable first-cut solution. The likely grep to use in that case would be:

grep "^[ X]*${key}[ X]"

where X is actually the tab character, as this allows for zero-or-more spaces or tabs before the key and a space or tab to terminate the key.

If the files are particularly large, you may want to look into using the associative arrays within awk but, since there's no indication of the size, I'd start with this one until you get to the point where it's running too slow.

like image 21
paxdiablo Avatar answered Nov 15 '22 10:11

paxdiablo