Logo Questions Linux Laravel Mysql Ubuntu Git Menu

sed/awk/perl remove the first two lines of a 3 line pattern






I have a huge text file. I need to replace all occurrences of this three line pattern:

|pattern|some data|
|pattern|some other data|

by the last line of the pattern:

|pattern|some other data|

remove the first two lines of the pattern, keep only the last one.

  • The second line of the pattern ends with two commas and does not start with |pattern|
  • The first line of the pattern line starts with |pattern| and does not end with two commas.
  • The third line of the pattern line starts with |pattern| and does not end with two commas.

I tried this:

sed 'N;N;/^|pattern|.*\n.*,,\n|pattern|.*/I,+1 d' trial.txt

with no much luck

Edit: Here is a more substantial example

#!/usr/bin/env bash
cat > trial.txt <<EOL

and it should become:



the first three lines of the file:


satisfy the pattern. So they are replaced by


so the top of the file now becomes:


the first three lines of which are:


which satisfy the pattern, so they are replaced by:


so the top of the file now is:



consider this file:

#!/usr/bin/env bash
cat > trial.txt <<EOL
like image 496
user189035 Avatar asked Jan 26 '23 19:01


2 Answers

Here is a simple take on it, using a buffer to collect and manage the pattern-lines

use warnings;
use strict;
use feature 'say';

my $file = shift or die "Usage: $0 file\n";

open my $fh, '<', $file or die "Can't open $file: $!";

my @buf;

while (<$fh>) { 
    if (/^\|pattern\|/ and not /,,$/) { 
        @buf = $_;     # start the buffer (first line) or overwrite (third)
    elsif (/,,$/ and not /^\|pattern\|/) { 
        if  (@buf) { push @buf, $_ }  # add to buffer with first line in it
        else       { say }            # not part of 3-line-pattern; print
    else { 
        say for @buf;  # time to print out buffer
        @buf = ();     # ... empty it ...
        say            # and print the current line

This prints the expected output.


  • Pattern-lines go in a buffer, and when we get the "third line" the first two need be removed. Then "assign" to the array whenever we see ^|pattern| -- either to start the buffer if it's the first line or to re-initialize the array (removing what's in it) if it's the third line

  • A line ending with ,, is added to the buffer, if there is a line there already. Nothing prohibits lines ending with ,, just so -- they may be outside of a pattern; in that case just print it

  • So each |pattern| line sets the buffer straight -- either starts it or resets it. Thus once we run into a line with neither ^|pattern| nor ,,$ we can print out our buffer, and that line

Please test more comprehensively, what i still didn't get to do.

In order to run this either in a pipeline or on a file use the "magical" <> filehandle. So it becomes

use warnings;
use strict;
use feature 'say';

my @buf;

while (<>) {  # reads lines from files given on command line, or from STDIN

Now you can run it either as data | script.pl or as script.pl datafile. (Make the script executable for this, or use as perl script.pl.)

The script's output goes to STDOUT which can be piped into other programs or redirected to a file.

like image 176
zdim Avatar answered Jan 28 '23 08:01


It may depend on how your file is huge but if it is smaller than the allowed memory size, how about:

perl -0777 -pe '
    1 while s/^\|pattern\|.+?\|\n(?<!\|pattern\|).+?,,\n(\|pattern\|.+?\|)$/\1/m;
' trial.txt


like image 20
tshiono Avatar answered Jan 28 '23 09:01
