I'm still new to bash and I've found similar questions to mine, but i still can't solve my problem. I have two files with 2 columns each, separated by a space.
file 1:
1 AGCATTTTTCAAACGAAAGATTTACTACCGATGTGT
2 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA
3 GATCGAACCGGCTGCCTACTGCGTGTAAAGCCGCCC
4 CCGACACAGAGAACATTAGAATACTCAGAGCCATNN
5 TAAGCCTGAGCCTAAACCTAAGCCTAAACATAAGAA
6 AGCAGAGAAGAGATGAGTTGTCGAGTGAGGCGTAAG
7 AACGTTGAAAAATTATCCCGTCAACAGTCTCCAGAA
8 GCCAGAGAGTAAAATATTGGGTGAAGCCAGAGAGTA
9 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA
file 2:
1 AGCATTTTTCAAACGAAAGATTTACTACCGATGTGT
2 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA
3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
4 CCGACACAGAGAACATTAGAATACTCAGAGCCATNN
5 TAAGCCTGAGCCTAAACCTAAGCCTAAACATAAGAA
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
8 GCCAGAGAGTAAAATATTGGGTGAAGCCAGAGAGTA
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
I'd like to compare only the second columns of each file, line by line, and output a third file with only the non-matching lines.
output:
3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
You can use awk
:
awk 'NR==FNR{a[$2];next} !($2 in a)' file1 file2
3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Explanation:
NR == FNR { # While processing the first file
a[$2] # just push the second field in an array
next # move to next record of first file
}
!($2 in a) # print lines from file2 if array a doesn't that line
grep -vf file1 file2
Output:
3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
You could use diff
for this. diff
will print out differences in two files.
/test>diff file1 file2
3c3
< 3 GATCGAACCGGCTGCCTACTGCGTGTAAAGCCGCCC
---
> 3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
6,7c6,7
< 6 AGCAGAGAAGAGATGAGTTGTCGAGTGAGGCGTAAG
< 7 AACGTTGAAAAATTATCCCGTCAACAGTCTCCAGAA
---
> 6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
9c9
< 9 TGCTCACCAACAAAAACAGGCGTCTCAGCAGCAGCA
---
> 9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Grepping for just differences from the second file:
/test>diff file1 file2 | grep ">"
> 3 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 6 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 7 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
> 9 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With