cat grab.txt <pre class="prettyprint"><code>My Dashboard Fnfjfjf. random test 00:50 1:01:56 My Notes No data found. Change Language + English Submit Estimation of Working Capital Lecture 1 Estimation of Working Capital Lecture 2 Estimation of Working Capital Lecture 3 Money Market Lecture 254 Money Market Lecture 255 Money Market Lecture 256 International Trade Lecture 257 International Trade Lecture 258 International Trade Lecture 259 Terms And Conditions 84749473837373 Random text fifjfofifofjfkfkf </code></pre> I need to filter this text after doing the following <ol> <li>Delete all lines before the first occurrence of word - Lecture</li> <li>Delete all lines after the last occurrence of word - Lecture</li> <li>Remove all empty lines</li> </ol> Expected output <pre class="prettyprint"><code>Estimation of Working Capital Lecture 1 Estimation of Working Capital Lecture 2 Estimation of Working Capital Lecture 3 Money Market Lecture 254 Money Market Lecture 255 Money Market Lecture 256 International Trade Lecture 257 International Trade Lecture 258 International Trade Lecture 259 </code></pre> What have I tried so far? <pre class="prettyprint"><code>cat grab.txt | sed -r '/^\s*$/d; /Lecture/,$!d' </code></pre> After searching for a bit and some trial-error, I am able to remove empty lines and remove all lines before the first occurrence but unable to remove all lines after the last occurrence. Note - Even tho my existing command is using sed, its fine if the answer is in awk, perl or grep

Could you please try following. Written and tested with shown samples with GNU <code>awk</code>. <pre class="prettyprint"><code>awk ' /Lecture/{ found=1 } found && NF{ val=(val?val ORS:"")$0 } END{ if(val){ match(val,/.*Lecture [0-9]+/) print substr(val,RSTART,RLENGTH) } }' Input_file </code></pre> Explanation: Adding detailed explanation for above. <pre class="prettyprint"><code>awk ' ##Starting awk program from here. /Lecture/{ ##Checking if a line has Lecture keyword then do following. found=1 ##Setting found to 1 here. } found && NF{ ##Checking if found is SET and line is NOT NULL then do following. val=(val?val ORS:"")$0 ##Creating va and keep adding its value in it. } END{ ##Starting END block of this code here. if(val){ ##Checking condition if val is set then do following. match(val,/.*Lecture [0-9]+/) ##Matching regex till Lecture digits in its value. print substr(val,RSTART,RLENGTH) ##Printing sub string of matched values here to print only matched values. } }' Input_file ##Mentioning Input_file name here. </code></pre>

How to delete all lines before the first and after the last occurrence of a string?

Tags:

regex

grep

sed

awk

cat grab.txt

My Dashboard
Fnfjfjf. random test
00:50

1:01:56
My Notes
No data found.

                                
Change Language                                                                                                                  + English                                                          

Submit


Estimation of Working Capital Lecture 1

Estimation of Working Capital Lecture 2

Estimation of Working Capital Lecture 3

Money Market Lecture 254

Money Market Lecture 255

Money Market Lecture 256

International Trade Lecture 257

International Trade Lecture 258

International Trade Lecture 259
Terms And Conditions
84749473837373
Random text fifjfofifofjfkfkf

I need to filter this text after doing the following

Delete all lines before the first occurrence of word - Lecture
Delete all lines after the last occurrence of word - Lecture
Remove all empty lines

Expected output

Estimation of Working Capital Lecture 1
Estimation of Working Capital Lecture 2
Estimation of Working Capital Lecture 3
Money Market Lecture 254
Money Market Lecture 255
Money Market Lecture 256
International Trade Lecture 257
International Trade Lecture 258
International Trade Lecture 259

What have I tried so far?

cat grab.txt | sed -r '/^\s*$/d; /Lecture/,$!d'

After searching for a bit and some trial-error, I am able to remove empty lines and remove all lines before the first occurrence but unable to remove all lines after the last occurrence.

Note - Even tho my existing command is using sed, its fine if the answer is in awk, perl or grep

772

asked Jun 21 '20 02:06

Sachin

1 Answers

Could you please try following. Written and tested with shown samples with GNU awk.

awk '
/Lecture/{
  found=1
}
found && NF{
  val=(val?val ORS:"")$0
}
END{
  if(val){
    match(val,/.*Lecture [0-9]+/)
    print substr(val,RSTART,RLENGTH)
  }
}'  Input_file

Explanation: Adding detailed explanation for above.

awk '                                        ##Starting awk program from here.
/Lecture/{                                   ##Checking if a line has Lecture keyword then do following.
  found=1                                    ##Setting found to 1 here.
}
found && NF{                                 ##Checking if found is SET and line is NOT NULL then do following.
  val=(val?val ORS:"")$0                     ##Creating va and keep adding its value in it.
}
END{                                         ##Starting END block of this code here.
  if(val){                                   ##Checking condition if val is set then do following.
    match(val,/.*Lecture [0-9]+/)            ##Matching regex till Lecture digits in its value.
    print substr(val,RSTART,RLENGTH)         ##Printing sub string of matched values here to print only matched values.
  }
}' Input_file                                ##Mentioning Input_file name here.

answered Nov 12 '22 12:11

RavinderSingh13

Related questions
                            
                                android java regex named groups
                            
                                Regex to extract between two strings (which are variables)
                            
                                PHP split each paragraph into array
                            
                                How to filter a Java String to get only alphabet characters?
                            
                                Regexp type for number 0
                            
                                How to rewrite Regex.Replace (due to async api)
                            
                                Do I always need to escape metacharacters in a string that is not a "literal"?
                            
                                Non-capturing group inside a named group
                            
                                Regular expression, selects a portion of text inside other
                            
                                Python regex match whole string only
                            
                                Replacing an integer (n) with a character repeated n times
                            
                                How to replace everything but a specified string using regex
                            
                                SpannableStringBuilder replace content with Regex
                            
                                Does String match glob pattern
                            
                                Regex to match text, but not if contained in brackets [duplicate]
                            
                                Java Regex split string between delimiter and keep delimiter [duplicate]
                            
                                How to create a random string from a regular expression
                            
                                Querying "like" in pymongo [duplicate]
                            
                                <.ident> function/capture in perl6 grammars
                            
                                Flutter Dart: How to extract a number from a string using RegEx

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With