I'm creating large file with my python script (more than 1GB
, actually there's 8 of them). Right after I create them I have to create process that will use those files.
The script looks like:
# This is more complex function, but it basically does this:
def use_file():
subprocess.call(['C:\\use_file', 'C:\\foo.txt']);
f = open( 'C:\\foo.txt', 'wb')
for i in 10000:
f.write( one_MB_chunk)
f.flush()
os.fsync( f.fileno())
f.close()
time.sleep(5) # With this line added it just works fine
t = threading.Thread( target=use_file)
t.start()
But application use_file
acts like foo.txt
is empty. There are some weird things going on:
C:\use_file C:\foo.txt
in console (after script finished) I get correct resultsuse_file()
in another python console I get correct resultsC:\foo.txt
is visible on disk right after open()
was called, but remains size 0B
until the end of scripttime.sleep(5)
it just starts working as expected (or rather required)I've already found:
os.fsync()
but it doesn't seem to work (result from use_file
is as if C:\foo.txt
was empty)buffering=(1<<20)
(when opening file) doesn't seem to work eitherI'm more and more curious about this behaviour.
Questions:
close()
operation into background? Where is this documented? sleep
: is that a windows/python bug?Notes: (for the case that there's something wrong with the other side) application use_data
uses:
handle = CreateFile("foo.txt", GENERIC_READ, FILE_SHARE_READ, NULL,
OPEN_EXISTING, 0, NULL);
size = GetFileSize(handle, NULL)
And then processes size
bytes from foo.txt
.
The flush() method in Python file handling clears the internal buffer of the file. In Python, files are automatically flushed while closing them. However, a programmer can flush a file before closing it by using the flush() method.
Instead, they batch the writes together in a buffer and save all of them to disk at the same time. Using fflush( ) forces anything pending in the write buffer to be actually written to disk.
The internal buffers are buffers created by the runtime/library/language that you're programming against and is meant to speed things up by avoiding system calls for every write.
f.close()
calls f.flush()
, which sends the data to the OS. That doesn't necessarily write the data to disk, because the OS buffers it. As you rightly worked out, if you want to force the OS to write it to disk, you need to os.fsync()
.
Have you considered just piping the data directly into use_file
?
EDIT: you say that os.fsync()
'doesn't work'. To clarify, if you do
f = open(...)
# write data to f
f.flush()
os.fsync(f.fileno())
f.close()
import pdb; pdb.set_trace()
and then look at the file on disk, does it have data?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With