I need to join two files on two fields. However i should retrieve all the values in file 1 even if the join fails its like a left outer join. File 1: <pre class="prettyprint"><code>01|a|jack|d 02|b|ron|c 03|d|tom|e </code></pre> File 2: <pre class="prettyprint"><code>01|a|nemesis|f 02|b|brave|d 04|d|gorr|h </code></pre> output: <pre class="prettyprint"><code>01|a|jack|d|nemesis|f 02|b|ron|c|brave|d 03|d|tom|e|| </code></pre>

It's <code>join -t '|' file1 file2 -a1</code> Options used: t: Delimiter. a: Decides the file number from which the unpaired lines have to be printed. <code>join -t '|' file1 file2 -a2</code> would do a right outer join. Sample Run <pre class="prettyprint"><code> [aman@aman test]$ cat f1 01|a|jack|d 02|b|ron|c 03|d|tom|e [aman@aman test]$ cat f2 01|a|nemesis|f 02|b|brave|d 04|d|gorr|h [aman@aman test]$ join -t '|' f1 f2 -a1 01|a|jack|d|a|nemesis|f 02|b|ron|c|b|brave|d 03|d|tom|e </code></pre>

To do exactly what the question asks is a bit more complicated than previous answer and would require something like this: <pre class="prettyprint"><code>sed 's/|/:/2' file1 | sort -t: >file1.tmp sed 's/|/:/2' file2 | sort -t: >file2.tmp join -t':' file1.tmp file2.tmp -a1 -e'|' -o'0,1.2,2.2' | tr ':' '|' </code></pre> Unix join can only join on a single field AFAIK so you must use files that use a different delimiter to "join two files on two fields", in this case the first two fields. I'll use a colon :, however if : exists in any of the input you would need to use something else, a tab character for example might be a better choice for production use. I also re-sort the output on the new compound field, <code>sort -t:</code>, which for the example input files makes no difference but would for real world data. <code>sed 's/|/:/2'</code> replaces the second occurrence of pipe with colon on each line in file. file1.tmp <pre class="prettyprint"><code>01|a:jack|d 02|b:ron|c 03|d:tom|e </code></pre> file2.tmp <pre class="prettyprint"><code>01|a:nemesis|f 02|b:brave|d 04|d:gorr|h </code></pre> Now we use <code>join</code> output filtered by <code>tr</code> with a few more advanced options: <ul> <li> <code>-t':'</code> specify the interim colon delimiter</li> <li> <code>-a1</code> left outer join</li> <li> <code>-e'|'</code> specifies the replacement string for failed joins, basically the final output delimiter N-1 times where N is the number of pipe delimited fields joined to the right of the colon in file2.tmp. In this case N=2 so one pipe character.</li> <li> <code>-o'0,1.2,2.2'</code> specifies the output format: <ul> <li> <code>0</code> join field</li> <li> <code>1.2</code> field 2 of file1.tmp, i.e. everything right of colon</li> <li> <code>2.2</code> field 2 of file2.tmp</li> </ul> </li> <li> <code>tr ':' '|'</code> Finally we translate the colons back to pipes for the final output.</li> </ul> The output now matches the question sample output exactly which the previous answer did not do: <pre class="prettyprint"><code>01|a|jack|d|nemesis|f 02|b|ron|c|brave|d 03|d|tom|e|| </code></pre>

Left outer join on two files in unix

Tags:

join

unix

I need to join two files on two fields. However i should retrieve all the values in file 1 even if the join fails its like a left outer join.

File 1:

01|a|jack|d 02|b|ron|c 03|d|tom|e

File 2:

01|a|nemesis|f 02|b|brave|d 04|d|gorr|h

output:

01|a|jack|d|nemesis|f 02|b|ron|c|brave|d 03|d|tom|e||

587

asked Nov 14 '12 16:11

user1824223

2 Answers

It's join -t '|' file1 file2 -a1

Options used:

t: Delimiter.
a: Decides the file number from which the unpaired lines have to be printed.

join -t '|' file1 file2 -a2 would do a right outer join.

Sample Run

   [aman@aman test]$ cat f1       01|a|jack|d      02|b|ron|c      03|d|tom|e     [aman@aman test]$ cat f2     01|a|nemesis|f      02|b|brave|d      04|d|gorr|h     [aman@aman test]$ join -t '|'  f1 f2 -a1     01|a|jack|d|a|nemesis|f      02|b|ron|c|b|brave|d      03|d|tom|e

answered Sep 17 '22 13:09

axiom

To do exactly what the question asks is a bit more complicated than previous answer and would require something like this:

sed 's/|/:/2' file1 | sort -t: >file1.tmp sed 's/|/:/2' file2 | sort -t: >file2.tmp join -t':' file1.tmp file2.tmp -a1 -e'|' -o'0,1.2,2.2' | tr ':' '|'

Unix join can only join on a single field AFAIK so you must use files that use a different delimiter to "join two files on two fields", in this case the first two fields. I'll use a colon :, however if : exists in any of the input you would need to use something else, a tab character for example might be a better choice for production use. I also re-sort the output on the new compound field, sort -t:, which for the example input files makes no difference but would for real world data. sed 's/|/:/2' replaces the second occurrence of pipe with colon on each line in file.

file1.tmp

01|a:jack|d 02|b:ron|c 03|d:tom|e

file2.tmp

01|a:nemesis|f 02|b:brave|d 04|d:gorr|h

Now we use join output filtered by tr with a few more advanced options:

-t':' specify the interim colon delimiter
-a1 left outer join
-e'|' specifies the replacement string for failed joins, basically the final output delimiter N-1 times where N is the number of pipe delimited fields joined to the right of the colon in file2.tmp. In this case N=2 so one pipe character.
-o'0,1.2,2.2' specifies the output format:
- 0 join field
- 1.2 field 2 of file1.tmp, i.e. everything right of colon
- 2.2 field 2 of file2.tmp
tr ':' '|' Finally we translate the colons back to pipes for the final output.

The output now matches the question sample output exactly which the previous answer did not do:

01|a|jack|d|nemesis|f 02|b|ron|c|brave|d 03|d|tom|e||

answered Sep 19 '22 13:09

idm

Related questions
                            
                                How to get some specific lines from huge text file in unix?
                            
                                One liner to set environment variable if doesn't exist, else append
                            
                                Reading file line by line (with space) in Unix Shell scripting - Issue
                            
                                Postgres locale error
                            
                                Writing a bash for-loop with a variable top-end
                            
                                What is the best lisp/scheme for unix scripting?
                            
                                What Happens When I Call fork() in Unix?
                            
                                Simple Unix way of looping through space-delimited strings?
                            
                                Make Arrow and delete keys work in KornShell command line
                            
                                Numbering lines matching the pattern using sed
                            
                                Is there an OS command I can run to determine if running inside a Xen based virtual machine
                            
                                Escape (end) in terminal
                            
                                why is zsh globbing not working with find command?
                            
                                Shell script to parse through a file ( csv ) and process line by line [duplicate]
                            
                                adding locale on server
                            
                                Python scripts in /usr/bin
                            
                                Bash: executing commands from within a chroot and switch user
                            
                                Node.js forever module - get path to a script that is running?
                            
                                How to determine a terminal's background color?
                            
                                What's the practical limit on the size of single packet transmitted over domain socket?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With