Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

find non-matching lines of two files bash

Tags:

bash

I'm still new to bash and I've found similar questions to mine, but i still can't solve my problem. I have two files with 2 columns each, separated by a space.

file 1:

1 AGCATTTTTCAAACGAAAGATTTACTACCGATGTGT  
2 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA  
3 GATCGAACCGGCTGCCTACTGCGTGTAAAGCCGCCC  
4 CCGACACAGAGAACATTAGAATACTCAGAGCCATNN   
5 TAAGCCTGAGCCTAAACCTAAGCCTAAACATAAGAA  
6 AGCAGAGAAGAGATGAGTTGTCGAGTGAGGCGTAAG  
7 AACGTTGAAAAATTATCCCGTCAACAGTCTCCAGAA  
8 GCCAGAGAGTAAAATATTGGGTGAAGCCAGAGAGTA  
9 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA  

file 2:

1 AGCATTTTTCAAACGAAAGATTTACTACCGATGTGT  
2 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA  
3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN  
4 CCGACACAGAGAACATTAGAATACTCAGAGCCATNN  
5 TAAGCCTGAGCCTAAACCTAAGCCTAAACATAAGAA  
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN  
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN  
8 GCCAGAGAGTAAAATATTGGGTGAAGCCAGAGAGTA  
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

I'd like to compare only the second columns of each file, line by line, and output a third file with only the non-matching lines.

output:

3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
like image 272
agrobins Avatar asked Mar 30 '15 19:03

agrobins


3 Answers

You can use awk:

awk 'NR==FNR{a[$2];next} !($2 in a)' file1 file2
3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Explanation:

NR == FNR {                  # While processing the first file
  a[$2]                      # just push the second field in an array
  next                       # move to next record of first file
}
!($2 in a)                   # print lines from file2 if array a doesn't that line
like image 97
anubhava Avatar answered Nov 17 '22 02:11

anubhava


grep -vf file1 file2

Output:

3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
like image 27
Cyrus Avatar answered Nov 17 '22 01:11

Cyrus


You could use diff for this. diff will print out differences in two files.

/test>diff file1 file2
3c3
< 3 GATCGAACCGGCTGCCTACTGCGTGTAAAGCCGCCC
---
> 3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6,7c6,7
< 6 AGCAGAGAAGAGATGAGTTGTCGAGTGAGGCGTAAG
< 7 AACGTTGAAAAATTATCCCGTCAACAGTCTCCAGAA
---
> 6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9c9
< 9 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA
---
> 9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Grepping for just differences from the second file:

/test>diff file1 file2 | grep ">"
> 3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
like image 1
JNevill Avatar answered Nov 17 '22 00:11

JNevill