Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Repeatedly extract a line between two delimiters in a text file, Python

Tags:

python

regex

I have a text file in the following format:

DELIMITER1
extract me
extract me
extract me
DELIMITER2

I'd like to extract every block of extract mes between DELIMITER1 and DELIMITER2 in the .txt file

This is my current, non-performing code:

import re
def GetTheSentences(file):
     fileContents =  open(file)
     start_rx = re.compile('DELIMITER')
     end_rx = re.compile('DELIMITER2')

     line_iterator = iter(fileContents)
     start = False
     for line in line_iterator:
           if re.findall(start_rx, line):

                start = True
                break
      while start:
           next_line = next(line_iterator)
           if re.findall(end_rx, next_line):
                break

           print next_line

           continue
      line_iterator.next()

Any ideas?

like image 257
Renklauf Avatar asked Aug 17 '11 19:08

Renklauf


People also ask

How do I extract a specific line from a file in Python?

Use readlines() to Read the range of line from the File The readlines() method reads all lines from a file and stores it in a list. You can use an index number as a line number to extract a set of lines from it. This is the most straightforward way to read a specific line from a file in Python.


1 Answers

You can simplify this to one regular expression using re.S, the DOTALL flag.

import re
def GetTheSentences(infile):
     with open(infile) as fp:
         for result in re.findall('DELIMITER1(.*?)DELIMITER2', fp.read(), re.S):
             print result
# extract me
# extract me
# extract me

This also makes use of the non-greedy operator .*?, so multiple non-overlapping blocks of DELIMITER1-DELIMITER2 pairs will all be found.

like image 144
Brent Newey Avatar answered Oct 06 '22 10:10

Brent Newey