Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find lines in one file but not the other using bash scripting?

Tags:

bash

shell

Imagine file 1:

#include "first.h"
#include "second.h"
#include "third.h"

// more code here
...

Imagine file 2:

#include "fifth.h"
#include "second.h"
#include "eigth.h"

// more code here
...

I want to get the headers that are included in file 2, but not in file 1, only those lines. So, when ran, a diff of file 1 and file 2 will produce:

#include "fifth.h"
#include "eigth.h"

I know how to do it in Perl/Python/Ruby, but I'd like to accomplish this without using a different programming language.

like image 863
Senthess Avatar asked Aug 03 '11 20:08

Senthess


3 Answers

This is a one-liner, but does not preserve the order:

comm -13 <(grep '#include' file1 | sort) <(grep '#include' file2 | sort)

If you need to preserve the order:

awk '
  !/#include/ {next} 
  FILENAME == ARGV[1] {include[$2]=1; next} 
  !($2 in include)
' file1 file2
like image 126
glenn jackman Avatar answered Nov 16 '22 02:11

glenn jackman


If it's ok to use a temp file, try this:

grep include file1.h > /tmp/x && grep -f /tmp/x -v file2.h | grep include

This

  • extracts all includes from file1.h and writes them to the file /tmp/x
  • uses this file to get all lines from file2.h that are not contained in this list
  • extracts all includes from the remainder of file2.h

It probably doesn't handle differences in whitespace correctly etc, though.

EDIT: to prevent false positives, use a different pattern for the last grep (thanks to jw013 for mentioning this):

grep include file1.h > /tmp/x && grep -f /tmp/x -v file2.h | grep "^#include"
like image 41
Frank Schmitt Avatar answered Nov 16 '22 00:11

Frank Schmitt


This variant requires an fgrep with the -f option. GNU grep (i.e. any Linux system, and then some) should work fine.

# Find occurrences of '#include' in file1.h
fgrep '#include' file1.h |
# Remove any identical lines from file2.h
fgrep -vxf - file2.h |
# Result is all lines not present in file1.h.  Out of those, extract #includes
fgrep '#include'

This does not require any sorting, nor any explicit temporary files. In theory, fgrep -f could use a temporary file behind the scenes, but I believe GNU fgrep doesn't.

like image 39
tripleee Avatar answered Nov 16 '22 02:11

tripleee