I need to join two files on two fields. However i should retrieve all the values in file 1 even if the join fails its like a left outer join.
File 1:
01|a|jack|d 02|b|ron|c 03|d|tom|e
File 2:
01|a|nemesis|f 02|b|brave|d 04|d|gorr|h
output:
01|a|jack|d|nemesis|f 02|b|ron|c|brave|d 03|d|tom|e||
The join command in UNIX is a command line utility for joining lines of two files on a common field. It can be used to join two files by selecting fields within the line and joining the files on them. The result is written to standard output.
If you are joining two tables and want the result set to include unmatched rows from only one table, use a LEFT OUTER JOIN clause or a RIGHT OUTER JOIN clause. The matching is based on the join condition.
There really is no difference between a LEFT JOIN and a LEFT OUTER JOIN. Both versions of the syntax will produce the exact same result in PL/SQL. Some people do recommend including outer in a LEFT JOIN clause so it's clear that you're creating an outer join, but that's entirely optional.
The join command provides us with the ability to merge two files together using a common field in each file as the link between related lines in the files.
It's join -t '|' file1 file2 -a1
Options used:
t: Delimiter.
a: Decides the file number from which the unpaired lines have to be printed.
join -t '|' file1 file2 -a2
would do a right outer join.
Sample Run
[aman@aman test]$ cat f1 01|a|jack|d 02|b|ron|c 03|d|tom|e [aman@aman test]$ cat f2 01|a|nemesis|f 02|b|brave|d 04|d|gorr|h [aman@aman test]$ join -t '|' f1 f2 -a1 01|a|jack|d|a|nemesis|f 02|b|ron|c|b|brave|d 03|d|tom|e
To do exactly what the question asks is a bit more complicated than previous answer and would require something like this:
sed 's/|/:/2' file1 | sort -t: >file1.tmp sed 's/|/:/2' file2 | sort -t: >file2.tmp join -t':' file1.tmp file2.tmp -a1 -e'|' -o'0,1.2,2.2' | tr ':' '|'
Unix join can only join on a single field AFAIK so you must use files that use a different delimiter to "join two files on two fields", in this case the first two fields. I'll use a colon :, however if : exists in any of the input you would need to use something else, a tab character for example might be a better choice for production use. I also re-sort the output on the new compound field, sort -t:
, which for the example input files makes no difference but would for real world data. sed 's/|/:/2'
replaces the second occurrence of pipe with colon on each line in file.
file1.tmp
01|a:jack|d 02|b:ron|c 03|d:tom|e
file2.tmp
01|a:nemesis|f 02|b:brave|d 04|d:gorr|h
Now we use join
output filtered by tr
with a few more advanced options:
-t':'
specify the interim colon delimiter-a1
left outer join-e'|'
specifies the replacement string for failed joins, basically the final output delimiter N-1 times where N is the number of pipe delimited fields joined to the right of the colon in file2.tmp. In this case N=2 so one pipe character.-o'0,1.2,2.2'
specifies the output format: 0
join field1.2
field 2 of file1.tmp, i.e. everything right of colon2.2
field 2 of file2.tmptr ':' '|'
Finally we translate the colons back to pipes for the final output.The output now matches the question sample output exactly which the previous answer did not do:
01|a|jack|d|nemesis|f 02|b|ron|c|brave|d 03|d|tom|e||
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With