Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete Lines : after pattern1 and between pattern2 and pattern3 using awk/sed/perl

Tags:

sed

awk

perl

I need to delete lines from a file which are after pattern1 and between pattern 2 and pattern3, as below:

aaaaaaaa 
bbbbbbbb
pattern1   <-----After this line
cdededed
ddededed
pattern2
fefefefe   <-----Delete this line
efefefef   <-----Delete this line
pattern3
adsffdsd
huaserew

Please can you suggest how this can be done using awk or sed or in perl.

like image 858
user1446027 Avatar asked Jun 09 '12 09:06

user1446027


4 Answers

sed '/pattern1/,${ /pattern2/,/pattern3/{/pattern2/b; /pattern3/b; d;} };' file

Formatted:

/pattern1/,$ {
    /pattern2/,/pattern3/ {
        /pattern2/b;
        /pattern3/b; 
        d;
    } 
}

Explained:

  • /pattern1/,$ is the range of lines after pattern1 to the end of the file
  • /pattern2/,/pattern3/ is the range of lines between pattern2 and pattern3
  • /pattern2/b; and /pattern3/b; skips the pattern2 and pattern3 lines which are otherwise included in the range (see the sed faq)
  • d deletes the other lines in the range

Update

From the comments, the inner block can be rewritten:

//!d

where:

  • // (an empty pattern) matches the last-used regex (which in this case is both pattern2 and pattern3
  • ! inverts the next command so that it applies to everything except lines matching the pattern
  • d deletes these lines

So the complete, rewritten pattern is:

/pattern1/,$ {
    /pattern2/,/pattern3/ {
        //!d
    } 
}
like image 186
beerbajay Avatar answered Nov 15 '22 06:11

beerbajay


use awk like a state machine:

awk '
    BEGIN {print_line = 1}
    /pattern1/ {consider = 1}
    consider && /pattern2/ {print_line = 0; print}
    consider && /pattern3/ {print_line = 1}
    print_line {print}
' filename
like image 30
glenn jackman Avatar answered Nov 15 '22 08:11

glenn jackman


If you're looking for a quick solution on the command line using perl, this is an ideal case for the flip-flop operator. Now, there are two ways which this question can be interpreted in edge cases -- both of these will function the same so long as pattern1 comes before pattern2:

  1. If pattern1 comes after pattern2 but before pattern3 delete everything in between pattern1 and pattern3

  2. or, If pattern1 comes after pattern2 but before pattern3 do nothing unless you see another pattern1.

Before we start, take note of the perl arguement -p

-n                assume "while (<>) { ... }" loop around program
-p                assume loop like -n but print line also, like sed

Now, To the first, I give you..

perl -pe'$x ||= /7/; $_= "" if /5/ .. /8/ and $x' <(seq 1 10)
1
2
3
4
5
6
9
10

$x ||= /7/: This sets $x to the return value of /7/ when $x is false. /7/ will return true when it matches. This means $x gets set to true, on the first match and the nature of ||= is never to set the variable when it is already true.

Then it sets $_ = '' if the range is between /5/ and /8/ and it has already set $x to true. Remember the way short-circuiting works: a && b means run b only if a evaluates to true. In this case, the mere fact of evaling a will set the state of the flip-flop operator -- that's what we want; yet, we only want the $_ = '' to occur if it's already seen 7.

Now, to the second interpretation of the quesiton just switch the order...

perl -pe'$x ||= /7/; $_= "" if $x and /5/ .. /8/' <(seq 1 10)

This will print the full range. Perl won't start looking for /5/ until after it finds /7/. In our sequential range that won't happen.

BTW, to really put some of these answers to shame, many of the spaces are not required...

perl -pe'$x||=/2/;$_=""if$x&&/5/../8/' # secksey
like image 37
NO WAR WITH RUSSIA Avatar answered Nov 15 '22 06:11

NO WAR WITH RUSSIA


Completing the Rosetta Stone:

perl -ne '++$saw_pattern1 if /pattern1/;
          $inside = ($saw_pattern1 && /pattern2/) .. /pattern3/;
          print unless $inside && ($inside > 1 && $inside !~ /E0$/)' \
  input

The code takes advantage of Perl’s .. range operator.

In scalar context, .. returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each .. operator maintains its own boolean state, even across calls to a subroutine that contains it. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. It doesn’t become false till the next time the range operator is evaluated …

The right operand is not evaluated while the operator is in the false state, and the left operand is not evaluated while the operator is in the true state. The precedence is a little lower than || and &&. The value returned is either the empty string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered. The final sequence number in a range has the string E0 appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can exclude the beginning point by waiting for the sequence number to be greater than 1.

like image 36
Greg Bacon Avatar answered Nov 15 '22 06:11

Greg Bacon