Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sed/awk - print text between patterns spanned across multiple lines

Tags:

bash

sed

awk

I am new to scripting and was trying to learn how to extract any text that exists between two different patterns. However, I am still not able to figure out how to extract text between two patterns in the following scenario:

If I have my input file reading:

Hi I would like
to print text
between these 
patterns

and my expected output is like:

I would like
to print text
between these 

i.e. my first search pattern is "Hi' and skip this pattern, but print everything that exists in the same line following that matched pattern. My second search pattern is "patterns" and I would like to completely avoid printing this line or any lines beyond that.

I tried the following:

sed -n '/Hi/,/patterns/p' test.txt 

[output]

Hi I would like
to print text
between these 
patterns 

Next, I tried:

`awk ' /'"Hi"'/ {flag=1;next} /'"pattern"'/{flag=0} flag { print }'` test.txt 

[output]

to print text
between these

Can someone help me out in identifying how to achieve this? Thanks in advance

like image 927
Amarnath Revanna Avatar asked Oct 23 '12 04:10

Amarnath Revanna


4 Answers

You have the right idea, a mini-state-machine in awk but you need some slight mods as per the following transcript:

pax> echo 'Hi I would like
to print text
between these 
patterns ' | awk '
    /patterns/ { echo = 0 }
    /Hi /      { gsub("^.*Hi ", "", $0); echo = 1 }
               { if (echo == 1) { print } }'

Or, in compressed form:

awk '/patterns/{e=0}/Hi /{gsub("^.*Hi ","",$0);e=1}{if(e==1){print}}'

The output of that is:

I would like
to print text
between these 

as requested.

The way this works is as follows. The echo variable is initially 0 meaning that no echoing will take place.

Each line is checked in turn. If it contains patterns, echoing is disabled.

If it contains Hi followed by a space, echoing is turned on and gsub is used to modify the line to get rid of everything up to the Hi.

Then, regardless, the line (possibly modified) is echoed when the echo flag is on.

Now, there's going to be edge cases such as:

  • lines containing two occurrences of Hi; or
  • lines containing something before the patterns.

You haven't specified how they should be handled so I didn't bother, but the basic concept should be the same.

like image 146
paxdiablo Avatar answered Nov 13 '22 11:11

paxdiablo


Updated the solution to remove the line "patterns" :

$ sed -n '/^Hi/,/patterns/{s/^Hi //;/^patterns/d;p;}' file
I would like
to print text
between these
like image 38
Guru Avatar answered Nov 13 '22 11:11

Guru


This might work for you (GNU sed):

sed '/Hi /!d;s//\n/;s/.*\n//;ta;:a;s/patterns.*$//;tb;$!{n;ba};:b;/^$/d' file
like image 39
potong Avatar answered Nov 13 '22 09:11

potong


Just set a flag (f) when you find+replace Hi at the start of a line, clear it when you find patterns, then invoke the default print when the flag is set:

$ awk 'sub(/^Hi /,""){f=1} /patterns/{f=0} f'  file
I would like
to print text
between these
like image 32
Ed Morton Avatar answered Nov 13 '22 09:11

Ed Morton