I am routinely missing the last few kb of a file I am trying to copy using shutil copyfile.
I did some research and do see someone asking about something similar here: python shutil copy function missing last few lines
But I am using copyfile, which DOES seem to use a with statement...
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
copyfileobj(fsrc, fdst)
So I am perplexed that more users aren't having this issue, if indeed it is some sort of buffering issue - I would think it'd be more well known.
I am calling copyfile very simply, don't think I could possibly be doing something wrong, essentially doing it the standard way I think:
copyfile(target_file_name,dest_file_name)
Yet I am missing the last 4kb or so of the file eachtime.
I have also not touched the copyfile function which gets called in shutil which is...
def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
So I am at a loss, but I suppose I am about to learn something about flushing, buffering, or the with statement, or ... Help! thanks
to Anand: Anand, I avoided mentioning that stuff bc it's my sense that it's not the problem, but since you asked... executive summary is that I am grabbing a file from an FTP, checking if the file is different from the last time I saved a copy, if so, downloading the file and saving a copy. It's circuitous spaghetti code and was written when I was a truly pure utilitarian novice of a coder I guess. It looks like:
for filename in ftp.nlst(filematch):
target_file_name = os.path.basename(filename)
with open(target_file_name ,'wb') as fhandle:
try:
ftp.retrbinary('RETR %s' % filename, fhandle.write)
the_files.append(target_file_name)
mtime = modification_date(target_file_name)
mtime_str_for_file = str(mtime)[0:10] + str(mtime)[11:13] + str(mtime)[14:16] + str(mtime)[17:19] + str(mtime)[20:28]#2014-12-11 15:08:00.338415.
sorted_xml_files = [file for file in glob.glob(os.path.join('\\\\Storage\\shared\\', '*.xml'))]
sorted_xml_files.sort(key=os.path.getmtime)
last_file = sorted_xml_files[-1]
file_is_the_same = filecmp.cmp(target_file_name, last_file)
if not file_is_the_same:
print 'File changed!'
copyfile(target_file_name, '\\\\Storage\\shared\\'+'datebreaks'+mtime_str_for_file+'.xml')
else:
print 'File '+ last_file +' hasn\'t changed, doin nothin'
continue
The issue here would most probably be that , when executing the line -
ftp.retrbinary('RETR %s' % filename, fhandle.write)
This is using the fhandle.write()
function to write the data from the ftp server to the file (with name - target_file_name
) , but by the time you are calling -shutil.copyfile
- the buffer for fhandle
has not completely flushed, so you are missing out on some data when copying the file.
To make sure that this does not occur, you can either move the copyfile
logic out of the with
block for fhandle
.
Or you can call fhandle.flush()
to flush the buffer , before copying the file .
I believe it would be better to close the file (move the logic out of the with
block). Example -
for filename in ftp.nlst(filematch):
target_file_name = os.path.basename(filename)
with open(target_file_name ,'wb') as fhandle:
ftp.retrbinary('RETR %s' % filename, fhandle.write)
the_files.append(target_file_name)
mtime = modification_date(target_file_name)
mtime_str_for_file = str(mtime)[0:10] + str(mtime)[11:13] + str(mtime)[14:16] + str(mtime)[17:19] + str(mtime)[20:28]#2014-12-11 15:08:00.338415.
sorted_xml_files = [file for file in glob.glob(os.path.join('\\\\Storage\\shared\\', '*.xml'))]
sorted_xml_files.sort(key=os.path.getmtime)
last_file = sorted_xml_files[-1]
file_is_the_same = filecmp.cmp(target_file_name, last_file)
if not file_is_the_same:
print 'File changed!'
copyfile(target_file_name, '\\\\Storage\\shared\\'+'datebreaks'+mtime_str_for_file+'.xml')
else:
print 'File '+ last_file +' hasn\'t changed, doin nothin'
continue
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With