Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

After writing to a file, why does os.path.getsize still return the previous size?

I am trying to split up a large xml file into smaller chunks. I write to the output file and then check its size to see if its passed a threshold, but I dont think the getsize() method is working as expected.

What would be a good way to get the filesize of a file that is changing in size.

Ive done something like this...

import string
import os

f1 = open('VSERVICE.xml', 'r')
f2 = open('split.xml', 'w')

for line in f1:
  if str(line) == '</Service>\n':
    break
  else:
    f2.write(line)
    size = os.path.getsize('split.xml')
    print('size = ' + str(size))

running this prints 0 as the filesize for about 80 iterations and then 4176. Does Python store the output in a buffer before actually outputting it?

like image 639
Maulin Avatar asked Jun 18 '09 16:06

Maulin


1 Answers

File size is different from file position. For example,

os.path.getsize('sample.txt') 

It exactly returns file size in bytes.

But

f = open('sample.txt')
print f.readline()
f.tell() 

Here f.tell() returns the current position of the file handler - i.e. where the next write will put its data. Since it is aware of the buffering, it should be accurate as long as you are simply appending to the output file.

like image 199
Sri Avatar answered Sep 23 '22 13:09

Sri