Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Left outer join on two files in unix

Tags:

join

unix

I need to join two files on two fields. However i should retrieve all the values in file 1 even if the join fails its like a left outer join.

File 1:

01|a|jack|d 02|b|ron|c 03|d|tom|e 

File 2:

01|a|nemesis|f 02|b|brave|d 04|d|gorr|h 

output:

01|a|jack|d|nemesis|f 02|b|ron|c|brave|d 03|d|tom|e|| 
like image 587
user1824223 Avatar asked Nov 14 '12 16:11

user1824223


People also ask

How do I merge two files in UNIX?

The join command in UNIX is a command line utility for joining lines of two files on a common field. It can be used to join two files by selecting fields within the line and joining the files on them. The result is written to standard output.

When left outer join is used?

If you are joining two tables and want the result set to include unmatched rows from only one table, use a LEFT OUTER JOIN clause or a RIGHT OUTER JOIN clause. The matching is based on the join condition.

Is Left join left outer join?

There really is no difference between a LEFT JOIN and a LEFT OUTER JOIN. Both versions of the syntax will produce the exact same result in PL/SQL. Some people do recommend including outer in a LEFT JOIN clause so it's clear that you're creating an outer join, but that's entirely optional.

What command can be used to merge together two files line by line based on a common field?

The join command provides us with the ability to merge two files together using a common field in each file as the link between related lines in the files.


2 Answers

It's join -t '|' file1 file2 -a1

Options used:

t: Delimiter.
a: Decides the file number from which the unpaired lines have to be printed.

join -t '|' file1 file2 -a2 would do a right outer join.

Sample Run

   [aman@aman test]$ cat f1       01|a|jack|d      02|b|ron|c      03|d|tom|e     [aman@aman test]$ cat f2     01|a|nemesis|f      02|b|brave|d      04|d|gorr|h     [aman@aman test]$ join -t '|'  f1 f2 -a1     01|a|jack|d|a|nemesis|f      02|b|ron|c|b|brave|d      03|d|tom|e 
like image 73
axiom Avatar answered Sep 17 '22 13:09

axiom


To do exactly what the question asks is a bit more complicated than previous answer and would require something like this:

sed 's/|/:/2' file1 | sort -t: >file1.tmp sed 's/|/:/2' file2 | sort -t: >file2.tmp join -t':' file1.tmp file2.tmp -a1 -e'|' -o'0,1.2,2.2' | tr ':' '|' 

Unix join can only join on a single field AFAIK so you must use files that use a different delimiter to "join two files on two fields", in this case the first two fields. I'll use a colon :, however if : exists in any of the input you would need to use something else, a tab character for example might be a better choice for production use. I also re-sort the output on the new compound field, sort -t:, which for the example input files makes no difference but would for real world data. sed 's/|/:/2' replaces the second occurrence of pipe with colon on each line in file.

file1.tmp

01|a:jack|d 02|b:ron|c 03|d:tom|e 

file2.tmp

01|a:nemesis|f 02|b:brave|d 04|d:gorr|h 

Now we use join output filtered by tr with a few more advanced options:

  • -t':' specify the interim colon delimiter
  • -a1 left outer join
  • -e'|' specifies the replacement string for failed joins, basically the final output delimiter N-1 times where N is the number of pipe delimited fields joined to the right of the colon in file2.tmp. In this case N=2 so one pipe character.
  • -o'0,1.2,2.2' specifies the output format:
    • 0 join field
    • 1.2 field 2 of file1.tmp, i.e. everything right of colon
    • 2.2 field 2 of file2.tmp
  • tr ':' '|' Finally we translate the colons back to pipes for the final output.

The output now matches the question sample output exactly which the previous answer did not do:

01|a|jack|d|nemesis|f 02|b|ron|c|brave|d 03|d|tom|e|| 
like image 44
idm Avatar answered Sep 19 '22 13:09

idm