Working in linux/shell env, how can I accomplish the following: text file 1 contains: <pre class="prettyprint"><code>1 2 3 4 5 </code></pre> text file 2 contains: <pre class="prettyprint"><code>6 7 1 2 3 4 </code></pre> I need to extract the entries in file 2 which are not in file 1. So '6' and '7' in this example. How do I do this from the command line? many thanks!

<pre class="prettyprint"><code>$ awk 'FNR==NR {a[$0]++; next} !($0 in a)' file1 file2 6 7 </code></pre> Explanation of how the code works: <ul> <li>If we're working on file1, track each line of text we see.</li> <li>If we're working on file2, and have not seen the line text, then print it.</li> </ul> Explanation of details: <ul> <li> <code>FNR</code> is the current file's record number</li> <li> <code>NR</code> is the current overall record number from all input files</li> <li> <code>FNR==NR</code> is true only when we are reading file1</li> <li> <code>$0</code> is the current line of text</li> <li> <code>a[$0]</code> is a hash with the key set to the current line of text</li> <li> <code>a[$0]++</code> tracks that we've seen the current line of text</li> <li> <code>!($0 in a)</code> is true only when we have not seen the line text</li> <li>Print the line of text if the above pattern returns true, this is the default awk behavior when no explicit action is given</li> </ul>

Using some lesser-known utilities: <pre class="prettyprint"><code>sort file1 > file1.sorted sort file2 > file2.sorted comm -1 -3 file1.sorted file2.sorted </code></pre> This will output duplicates, so if there is 1 <code>3</code> in <code>file1</code>, but 2 in <code>file2</code>, this will still output 1 <code>3</code>. If this is not what you want, pipe the output from <code>sort</code> through <code>uniq</code> before writing it to a file: <pre class="prettyprint"><code>sort file1 | uniq > file1.sorted sort file2 | uniq > file2.sorted comm -1 -3 file1.sorted file2.sorted </code></pre> There are lots of utilities in the GNU coreutils package that allow for all sorts of text manipulations.

extracting unique values between 2 sets/files

Tags:

linux

bash

scripting

command-line

perl

Working in linux/shell env, how can I accomplish the following:

text file 1 contains:

1 2 3 4 5

text file 2 contains:

6 7 1 2 3 4

I need to extract the entries in file 2 which are not in file 1. So '6' and '7' in this example.

How do I do this from the command line?

many thanks!

883

asked Jan 17 '11 19:01

mark

2 Answers

$ awk 'FNR==NR {a[$0]++; next} !($0 in a)' file1 file2 6 7

Explanation of how the code works:

If we're working on file1, track each line of text we see.
If we're working on file2, and have not seen the line text, then print it.

Explanation of details:

FNR is the current file's record number
NR is the current overall record number from all input files
FNR==NR is true only when we are reading file1
$0 is the current line of text
a[$0] is a hash with the key set to the current line of text
a[$0]++ tracks that we've seen the current line of text
!($0 in a) is true only when we have not seen the line text
Print the line of text if the above pattern returns true, this is the default awk behavior when no explicit action is given

answered Sep 20 '22 00:09

SiegeX

Using some lesser-known utilities:

sort file1 > file1.sorted sort file2 > file2.sorted comm -1 -3 file1.sorted file2.sorted

This will output duplicates, so if there is 1 3 in file1, but 2 in file2, this will still output 1 3. If this is not what you want, pipe the output from sort through uniq before writing it to a file:

sort file1 | uniq > file1.sorted sort file2 | uniq > file2.sorted comm -1 -3 file1.sorted file2.sorted

There are lots of utilities in the GNU coreutils package that allow for all sorts of text manipulations.

answered Sep 22 '22 00:09

Daniel Gallagher

Related questions
                            
                                How do I recursively list all directories at a location, breadth-first?
                            
                                How to set "execute" attribute to a file and check it in SVN from Windows?
                            
                                MySQL won't start - error: su: warning: cannot change directory to /nonexistent: No such file or directory
                            
                                Check that there are at least two arguments given in a bash script
                            
                                PHP CURL Enable Linux
                            
                                Get free disk space with df to just display free space in kb?
                            
                                Convert string to hexadecimal on command line
                            
                                echo "string" | xclip -selection clipboard , copies the 'string' but also adds a new line to it. how to fix this?
                            
                                How to find duplicate files with same name but in different case that exist in same directory in Linux?
                            
                                Removing Windows newlines on Linux (sed vs. awk)
                            
                                Format of /dev/input/event*
                            
                                Python error - "ImportError: cannot import name 'dist'"
                            
                                Error with igraph library - deprecated library
                            
                                How does a pipe work in Linux?
                            
                                Finding original MAC address from Hardware itself
                            
                                Finding process count in Linux via command line
                            
                                Why use Mono? [closed]
                            
                                Session cookies http & secure flag - how do you set these?
                            
                                Binary grep on Linux?
                            
                                How to reinstall the latest cmake version?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With