I have a fasta file like the test one here:
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 1:N:0:GCCAAT
CCTAGCACCATGATTTAATGTTTCTTTTGTACGTTCTTTCTTTGGAAACTGCACTTGTTGCAACCTTGCAAGCCATATAAACACATTTCAGATATAAGGCT
>HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 2:N:0:GCCAAT
AAAACATAAATTTGAGCTTGACAAAAATTAAAAATGAGCCCAGCCTTATATCTGAAATGTGTTTATATGGCTTGCAAGGTTGCAACAAGTGCAGTTTCCAA
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 1:N:0:GCCAAT
ATATTTGAATTATCAGAAATAAACACAAAGAAAACCTAGAACAGATAATTTCTTCCACATTATTGATCAGATACAGATTTCAAGGGTACCGTTGTGAATTG
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 1:N:0:GCCAAT
CTTACTTTGCCTCTCTCAGCCAATGTCTCCTGAGTCTAATTTTTTGGAGGCTAAGCTATGAGCTAATGATGGGTTCCATTTGGGGCCAATGCTTCAGCCTG
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
When i type in a simple grep command like:
grep -B1 "CTT" test.fasta
I get a really strange output in which "--" is sometimes placed on a newline above the grep hit like so:
>HWI-D00196:168:C66U5ANXX:4:1304:10466:100132 2:N:0:GCCAAT
AAACGATTGATAGATCTATTTGCATTATAAAAACATTAAAAAAACAAAATACTGATTAAATGTCGTCTTTCTATTCCACAATTTTATAGATCTCACTGTAT
--
>HWI-D00196:168:C66U5ANXX:4:1307:12056:64030 2:N:0:GCCAAT
CTATTAGTTCTTATCTTTGCCTGCAAATATAAGACTAGCGCTTGAGTAGCTGACAGAGACAAAGTAAGCTGGAGTGTTTATCACCTGGTCACTCCAATTGT
I can't figure out why some fasta entries have this and others don't. I don't get this problem when i remove the -B1. I can remove those lines from my file with a grep -v "--"
statement, but I'd really like to understand what's going on here.
You are asking for one line of leading context by using the -B1
option. This means grep will display both the line which matched and the line directly before it. Each match will be separated by --
on a line by itself as shown below:
$ man grep | grep -B1 context
-A num, --after-context=num
Print num lines of trailing context after each match. See also
--
-B num, --before-context=num
Print num lines of leading context before each match. See also
--
-C[num, --context=num]
Print num lines of leading and trailing context surrounding each
--
--context[=num]
Print num lines of leading and trailing context. The default is
The reason you aren't seeing --
between every match is that the context is only displayed above a sequence of consecutive matches. So see the following example:
seq 13 | grep -B1 1
1
--
9
10
11
12
13
The seq command produces all the numbers between 1 and 13. Only the first line and the lines from 10 on contain a 1, so you see the 1 in its own group, then --
, then the one line context, then the group of consecutive matching lines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With