I want to delete some specific lines in a file.
The part I want to delete is enclosed between two lines (that will be deleted too), named STARTING_LINE and CLOSING_LINE. If there is no closing line before the end of the file, then the operation should stop.
Example:
...blabla...
[Start] <-- # STARTING_LINE
This is the body that I want to delete
[End] <-- # CLOSING_LINE
...blabla...
I came out with three different ways to achieve the same thing (plus one provided by tdelaney's answer below), but I am wondering which one is the best. Please note that I am not looking for a subjective opinion: I would like to know if there are some real reasons why I should choose one method over another.
if conditions (just one for loop):def delete_lines(filename):
with open(filename, 'r+') as my_file:
text = ''
found_start = False
found_end = False
for line in my_file:
if not found_start and line.strip() == STARTING_LINE.strip():
found_start = True
elif found_start and not found_end:
if line.strip() == CLOSING_LINE.strip():
found_end = True
continue
else:
print(line)
text += line
# Go to the top and write the new text
my_file.seek(0)
my_file.truncate()
my_file.write(text)
for loops on the open file:def delete_lines(filename):
with open(filename, 'r+') as my_file:
text = ''
for line in my_file:
if line.strip() == STARTING_LINE.strip():
# Skip lines until we reach the end of the function
# Note: the next `for` loop iterates on the following lines, not
# on the entire my_file (i.e. it is not starting from the first
# line). This will allow us to avoid manually handling the
# StopIteration exception.
found_end = False
for function_line in my_file:
if function_line.strip() == CLOSING_LINE.strip():
print("stop")
found_end = True
break
if not found_end:
print("There is no closing line. Stopping")
return False
else:
text += line
# Go to the top and write the new text
my_file.seek(0)
my_file.truncate()
my_file.write(text)
while True and next() (with StopIteration exception)def delete_lines(filename):
with open(filename, 'r+') as my_file:
text = ''
for line in my_file:
if line.strip() == STARTING_LINE.strip():
# Skip lines until we reach the end of the function
while True:
try:
line = next(my_file)
if line.strip() == CLOSING_LINE.strip():
print("stop")
break
except StopIteration as ex:
print("There is no closing line.")
else:
text += line
# Go to the top and write the new text
my_file.seek(0)
my_file.truncate()
my_file.write(text)
itertools (from tdelaney's answer):def delete_lines_iter(filename):
with open(filename, 'r+') as wrfile:
with open(filename, 'r') as rdfile:
# write everything before startline
wrfile.writelines(itertools.takewhile(lambda l: l.strip() != STARTING_LINE.strip(), rdfile))
# drop everything before stopline.. and the stopline itself
try:
next(itertools.dropwhile(lambda l: l.strip() != CLOSING_LINE.strip(), rdfile))
except StopIteration:
pass
# include everything after
wrfile.writelines(rdfile)
wrfile.truncate()
It seems that these four implementations achieve the same result. So...
Question: which one should I use? Which one is the most Pythonic? Which one is the most efficient?
Is there a better solution instead?
Edit: I tried to evaluate the methods on a big file using timeit. In order to have the same file on each iteration, I removed the writing parts of each code; this means that the evaluation mostly regards the reading (and file opening) task.
t_if = timeit.Timer("delete_lines_if('test.txt')", "from __main__ import delete_lines_if")
t_for = timeit.Timer("delete_lines_for('test.txt')", "from __main__ import delete_lines_for")
t_while = timeit.Timer("delete_lines_while('test.txt')", "from __main__ import delete_lines_while")
t_iter = timeit.Timer("delete_lines_iter('test.txt')", "from __main__ import delete_lines_iter")
print(t_if.repeat(3, 4000))
print(t_for.repeat(3, 4000))
print(t_while.repeat(3, 4000))
print(t_iter.repeat(3, 4000))
Result:
# Using IF statements:
[13.85873354100022, 13.858520206999856, 13.851908310999988]
# Using nested FOR:
[13.22578497800032, 13.178281234999758, 13.155530822999935]
# Using while:
[13.254994718000034, 13.193942980999964, 13.20395484699975]
# Using itertools:
[10.547019549000197, 10.506679693000024, 10.512742852999963]
You can make it fancy with itertools. I'd be interested in how timing compares.
import itertools
def delete_lines(filename):
with open(filename, 'r+') as wrfile:
with open(filename, 'r') as rdfile:
# write everything before startline
wrfile.writelines(itertools.takewhile(lambda l: l.strip() != STARTING_LINE.strip(), rdfile))
# drop everything before stopline.. and the stopline itself
next(itertools.dropwhile(lambda l: l.strip() != CLOSING_LINE.strip(), rdfile))
# include everything after
wrfile.writelines(rdfile)
wrfile.truncate()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With