I have a text file in the following format:
DELIMITER1
extract me
extract me
extract me
DELIMITER2
I'd like to extract every block of extract me
s between DELIMITER1 and DELIMITER2 in the .txt file
This is my current, non-performing code:
import re
def GetTheSentences(file):
fileContents = open(file)
start_rx = re.compile('DELIMITER')
end_rx = re.compile('DELIMITER2')
line_iterator = iter(fileContents)
start = False
for line in line_iterator:
if re.findall(start_rx, line):
start = True
break
while start:
next_line = next(line_iterator)
if re.findall(end_rx, next_line):
break
print next_line
continue
line_iterator.next()
Any ideas?
Use readlines() to Read the range of line from the File The readlines() method reads all lines from a file and stores it in a list. You can use an index number as a line number to extract a set of lines from it. This is the most straightforward way to read a specific line from a file in Python.
You can simplify this to one regular expression using re.S
, the DOTALL flag.
import re
def GetTheSentences(infile):
with open(infile) as fp:
for result in re.findall('DELIMITER1(.*?)DELIMITER2', fp.read(), re.S):
print result
# extract me
# extract me
# extract me
This also makes use of the non-greedy operator .*?
, so multiple non-overlapping blocks of DELIMITER1-DELIMITER2 pairs will all be found.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With