I have some data that looks like this. It comes in chunk of four. Each chunk starts with a @
character.
@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
888888888888888888888888888
At the third line of each chunk, I want to remove the text that comes after the +
character, resulting in:
@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888
Is there a compact way to do that in sed or Perl?
Assuming you just don't want to blindly remove the rest of every line starting with a +
, then you can do this:
sed '/^@/{N;N;s/\n+.*/\n+/}' infile
$ sed '/^@/{N;N;s/\n+.*/\n+/}' infile
@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888
+Dont remove me
*Note: Although the above command keys on the @
to determine if a line with a +
should be altered, it will still alter the 2nd line if it happens to also start with a +
. It doesn't sound like this is the case, but if you want to exclude this corner case as well, the following minor alteration will protect against that:
sed '/^@/{N;N;s/\(.*\)\n+.*/\1\n+/}' infile
$ sed '/^@/{N;N;s/\(.*\)\n+.*/\1\n+/}' ./infile
@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
+AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888
+Dont remove me
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With