Sed. How to remove line match with pattern and strings arround it?

Question

I have a file where you want to delete line matching by pattern and remove strings above and below.

By example:

FFFFIFIBBFFFFFFFFFFFFFBBBBFBBBBFBBBB77<<BBBBBB7B<BBBBBB<B< @HISEQ:102:h9u5badxx:1:1101:13002:2147 1:N:0:CTGT GATCCCCGTCTATCAGATACACGTTACTCAGCTAGTGCGAATGCGAACGCGAAATTTT + FFFFFFFFBBFFFFFFFFFFFFFBFBFFFFFFFFFBFFFBFFFFFBFFFFFFFFFBFB @HISEQ:102:h9u5badxx:1:1101:15368:2194 1:N:0:CTGT + FFIFBFFIFFBBBFFFFFFFBBFFBFFBBBFFFBB7BBBBBBFFFBB700<7770<BBB0<0<BFFBFBFFFFF @HISEQ:102:h9u5badxx:1:1101:19167:2169 1:N:0:CTGT GATCTCATATAGGGCAGCGTGGTCGCGGC

I want to remove second block which does not contain the nucleotide sequence.

The end result:

`FFFFIFIBBFFFFFFFFFFFFFBBBBFBBBBFBBBB77<<BBBBBB7B<BBBBBB<B<
@HISEQ:102:h9u5badxx:1:1101:13002:2147 1:N:0:CTGT
GATCCCCGTCTATCAGATACACGTTACTCAGCTAGTGCGAATGCGAACGCGAAATTTT
+
FFIFBFFIFFBBBFFFFFFFBBFFBFFBBBFFFBB7BBBBBBFFFBB700<7770<BBB0<0<BFFBFBFFFFF
@HISEQ:102:h9u5badxx:1:1101:19167:2169 1:N:0:CTGT
GATCTCATATAGGGCAGCGTGGTCGCGGC
`

Pattern which matched this block

'^.+$(
)^(@HISEQ).*$(
)^\+'

works in perl and javascript, but not sed.

Because sed does not work with line break.

I found the solution

sed -e ':a;N;$!ba;s/
/ /' test

But this code replace line break to space. If insert to this code my regexp:

sed -e ':a;N;$!ba;/^.+$(
)^(@HISEQ).*$(
)^\+/d' test

this does not work. Can you help me find the solution of this problem?

I'm just stupid. I misunderstood the file format. Input:

@HWI-ST383:199:D1L73ACXX:3:1101:1309:1956 1:N:0:ACAGTGA 
+ 
JJJHIIJFIJJJJ=BFFFFFEEEEEEDDDDDDDDDDBD 
@HWI-ST383:199:D1L73ACXX:3:1101:3437:1952 1:N:0:ACAGTGA
GATCTCGAAGCAAGAGTACGACGAGTCGGGCCCCTCCA 
+ 
IIIIFFF<?6?FAFEC@=C@1AE###############

How to edit the regular exp to get what you want

output:

@HWI-ST383:199:D1L73ACXX:3:1101:3437:1952 1:N:0:ACAGTGA
GATCTCGAAGCAAGAGTACGACGAGTCGGGCCCCTCCA 
+ 
IIIIFFF<?6?FAFEC@=C@1AE###############

Wintermute · Accepted Answer

If I understand you correctly, then

sed ':loop; N; /
+/ ! { $ ! b loop }; /
@HISEQ[^
]\+
+/ d' foo.txt

will work. This is as follows:

:loop                    # in a loop
N                        # fetch more lines
/
+/ ! { $ ! b loop }   # until one starts with + or is the last line
/
@HISEQ[^
]\+
+/ d   # if the penultimate line of all that begins with @HISEQ,
                         # discard the lot.

That last pattern is using the fact that it is checked right after the first line that begins with + is found, so the + at the end of it uniquely matches the start of the last line in the block.

William Pursell · Answer

To remove the second block, you can just do:

awk 'NR!=2' RS=+ ORS=+ input

But I would suspect you want something more like:

awk '/[GATC]{5,}
/' RS=+ ORS=+ input

or

awk '/
[GATC]*
/' RS=+ ORS=+ input

Sed. How to remove line match with pattern and strings arround it?

Tags:

regex

bash

sed

Anton Ivankin

3 Answers

Wintermute

William Pursell

Wintermute

Recent Activity

Donate For Us

Sed. How to remove line match with pattern and strings arround it?

Tags:

regex

bash

sed

Anton Ivankin

3 Answers

Wintermute

William Pursell

Wintermute

Related questions

Recent Activity

Donate For Us