I need to delete lines from a file which are after pattern1 and between pattern 2 and pattern3, as below:
aaaaaaaa
bbbbbbbb
pattern1 <-----After this line
cdededed
ddededed
pattern2
fefefefe <-----Delete this line
efefefef <-----Delete this line
pattern3
adsffdsd
huaserew
Please can you suggest how this can be done using awk or sed or in perl.
sed '/pattern1/,${ /pattern2/,/pattern3/{/pattern2/b; /pattern3/b; d;} };' file
Formatted:
/pattern1/,$ {
/pattern2/,/pattern3/ {
/pattern2/b;
/pattern3/b;
d;
}
}
Explained:
/pattern1/,$
is the range of lines after pattern1
to the end of the file/pattern2/,/pattern3/
is the range of lines between pattern2
and pattern3
/pattern2/b;
and /pattern3/b;
skips the pattern2
and pattern3
lines which are otherwise included in the range (see the sed faq)d
deletes the other lines in the rangeUpdate
From the comments, the inner block can be rewritten:
//!d
where:
//
(an empty pattern) matches the last-used regex (which in this case is both pattern2
and pattern3
!
inverts the next command so that it applies to everything except lines matching the patternd
deletes these linesSo the complete, rewritten pattern is:
/pattern1/,$ {
/pattern2/,/pattern3/ {
//!d
}
}
use awk like a state machine:
awk '
BEGIN {print_line = 1}
/pattern1/ {consider = 1}
consider && /pattern2/ {print_line = 0; print}
consider && /pattern3/ {print_line = 1}
print_line {print}
' filename
If you're looking for a quick solution on the command line using perl, this is an ideal case for the flip-flop
operator. Now, there are two ways which this question can be interpreted in edge cases -- both of these will function the same so long as pattern1
comes before pattern2
:
If pattern1 comes after pattern2 but before pattern3 delete everything in between pattern1 and pattern3
or, If pattern1 comes after pattern2 but before pattern3 do nothing unless you see another pattern1.
Before we start, take note of the perl arguement -p
-n assume "while (<>) { ... }" loop around program
-p assume loop like -n but print line also, like sed
Now, To the first, I give you..
perl -pe'$x ||= /7/; $_= "" if /5/ .. /8/ and $x' <(seq 1 10)
1
2
3
4
5
6
9
10
$x ||= /7/
: This sets $x
to the return value of /7/
when $x
is false
. /7/
will return true
when it matches. This means $x
gets set to true, on the first match and the nature of ||=
is never to set the variable when it is already true.
Then it sets $_ = ''
if the range is between /5/
and /8/
and it has already set $x
to true. Remember the way short-circuiting works: a && b
means run b
only if a
evaluates to true
. In this case, the mere fact of evaling a
will set the state of the flip-flop operator -- that's what we want; yet, we only want the $_ = ''
to occur if it's already seen 7
.
Now, to the second interpretation of the quesiton just switch the order...
perl -pe'$x ||= /7/; $_= "" if $x and /5/ .. /8/' <(seq 1 10)
This will print the full range. Perl won't start looking for /5/
until after it finds /7/
. In our sequential range that won't happen.
BTW, to really put some of these answers to shame, many of the spaces are not required...
perl -pe'$x||=/2/;$_=""if$x&&/5/../8/' # secksey
Completing the Rosetta Stone:
perl -ne '++$saw_pattern1 if /pattern1/;
$inside = ($saw_pattern1 && /pattern2/) .. /pattern3/;
print unless $inside && ($inside > 1 && $inside !~ /E0$/)' \
input
The code takes advantage of Perl’s ..
range operator.
In scalar context,
..
returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each..
operator maintains its own boolean state, even across calls to a subroutine that contains it. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. It doesn’t become false till the next time the range operator is evaluated …The right operand is not evaluated while the operator is in the false state, and the left operand is not evaluated while the operator is in the true state. The precedence is a little lower than
||
and&&
. The value returned is either the empty string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered. The final sequence number in a range has the stringE0
appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can exclude the beginning point by waiting for the sequence number to be greater than 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With