Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I remove part of a line in a multi-line chunk using sed or Perl?

Tags:

linux

unix

sed

perl

I have some data that looks like this. It comes in chunk of four. Each chunk starts with a @ character.

@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
888888888888888888888888888

At the third line of each chunk, I want to remove the text that comes after the + character, resulting in:

@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888

Is there a compact way to do that in sed or Perl?

like image 329
neversaint Avatar asked Feb 26 '23 06:02

neversaint


1 Answers

Assuming you just don't want to blindly remove the rest of every line starting with a +, then you can do this:

sed '/^@/{N;N;s/\n+.*/\n+/}' infile

Output

$ sed '/^@/{N;N;s/\n+.*/\n+/}' infile
@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888
+Dont remove me

*Note: Although the above command keys on the @ to determine if a line with a + should be altered, it will still alter the 2nd line if it happens to also start with a +. It doesn't sound like this is the case, but if you want to exclude this corner case as well, the following minor alteration will protect against that:

sed '/^@/{N;N;s/\(.*\)\n+.*/\1\n+/}' infile

Output

$ sed '/^@/{N;N;s/\(.*\)\n+.*/\1\n+/}' ./infile
@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
+AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888
+Dont remove me
like image 168
SiegeX Avatar answered Mar 05 '23 18:03

SiegeX