When a novice (like me) asks for reading/processing a text file in python he often gets answers like:
with open("input.txt", 'r') as f:
for line in f:
#do your stuff
Now I would like to truncate everything in the file I'm reading after a special line. After modifying the example above I use:
with open("input.txt", 'r+') as file:
for line in file:
print line.rstrip("\n\r") #for debug
if line.rstrip("\n\r")=="CC":
print "truncating!" #for debug
file.truncate();
break;
and expect it to throw away everything after the first "CC" seen. Running this code on input.txt:
AA
CC
DD
the following is printed on the console (as expected):
AA
CC
truncating!
but the file "input.txt" stays unchanged!?!?
How can that be? What I'm doing wrong?
Edit: After the operation I want the file to contain:
AA
CC
In simple words, truncating a file means removing the file contents without deleting the file. Truncating a file is much faster and easier than deleting the file , recreating it, and setting the correct permissions and ownership .
In databases and computer networking data truncation occurs when data or a data stream (such as a file) is stored in a location too short to hold its entire length.
The truncate command effectively eliminates all the contents of a file. It does not delete the file itelf, but leaves it as a zero-byte file on the disk. The file permissions and ownership will be preserved if you use the truncate command.
With truncate() , you can declare how much of the file you want to remove, based on where you're currently at in the file. Without parameters, truncate() acts like w, whereas w always just wipes the whole file clean.
It looks like you're falling victim to a read-ahead buffer used internally by Python. From the documentation for the file.next() method:
A file object is its own iterator, for example
iter(f)
returns f (unless f is closed). When a file is used as an iterator, typically in afor
loop (for example,for line in f: print line.strip()
), thenext()
method is called repeatedly. This method returns the next input line, or raisesStopIteration
when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make afor
loop the most efficient way of looping over the lines of a file (a very common operation), thenext()
method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combiningnext()
with other file methods (likereadline()
) does not work right. However, usingseek()
to reposition the file to an absolute position will flush the read-ahead buffer.
The upshot is that the file's position is not where you would expect it to be when you truncate. One way around this is to use readline
to loop over the file, rather than the iterator:
line = file.readline()
while line:
...
line = file.readline()
In addition to glibdud's answer, truncate() needs the size from where it deletes the content. You can get the current position in your file by the tell()
command. As he mentioned, by using the for-loop, the next()
prohibits commands like tell. But in the suggested while-loop, you can truncate at the current tell()-position. So the complete code would look like this:
Python 3:
with open("test.txt", 'r+') as file:
line = file.readline()
while line:
print(line.strip())
if line.strip() == "CC":
print("truncating")
file.truncate(file.tell())
break
line = file.readline()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With