Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python shutil copyfile - missing last few lines

I am routinely missing the last few kb of a file I am trying to copy using shutil copyfile.

I did some research and do see someone asking about something similar here: python shutil copy function missing last few lines

But I am using copyfile, which DOES seem to use a with statement...

with open(src, 'rb') as fsrc:
    with open(dst, 'wb') as fdst:
        copyfileobj(fsrc, fdst)

So I am perplexed that more users aren't having this issue, if indeed it is some sort of buffering issue - I would think it'd be more well known.

I am calling copyfile very simply, don't think I could possibly be doing something wrong, essentially doing it the standard way I think:

copyfile(target_file_name,dest_file_name) 

Yet I am missing the last 4kb or so of the file eachtime.

I have also not touched the copyfile function which gets called in shutil which is...

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

So I am at a loss, but I suppose I am about to learn something about flushing, buffering, or the with statement, or ... Help! thanks


to Anand: Anand, I avoided mentioning that stuff bc it's my sense that it's not the problem, but since you asked... executive summary is that I am grabbing a file from an FTP, checking if the file is different from the last time I saved a copy, if so, downloading the file and saving a copy. It's circuitous spaghetti code and was written when I was a truly pure utilitarian novice of a coder I guess. It looks like:

for filename in ftp.nlst(filematch):
    target_file_name = os.path.basename(filename)
    with open(target_file_name ,'wb') as fhandle:
    try:
        ftp.retrbinary('RETR %s' % filename, fhandle.write)
        the_files.append(target_file_name)
        mtime = modification_date(target_file_name)
        mtime_str_for_file = str(mtime)[0:10] + str(mtime)[11:13] + str(mtime)[14:16]    + str(mtime)[17:19] + str(mtime)[20:28]#2014-12-11 15:08:00.338415.
        sorted_xml_files = [file for file in glob.glob(os.path.join('\\\\Storage\\shared\\', '*.xml'))]
        sorted_xml_files.sort(key=os.path.getmtime)
        last_file = sorted_xml_files[-1]
        file_is_the_same = filecmp.cmp(target_file_name, last_file)
        if not file_is_the_same:
            print 'File changed!'
            copyfile(target_file_name, '\\\\Storage\\shared\\'+'datebreaks'+mtime_str_for_file+'.xml') 
        else:
            print 'File '+ last_file +' hasn\'t changed, doin nothin'
            continue
like image 408
10mjg Avatar asked Jul 21 '15 18:07

10mjg


1 Answers

The issue here would most probably be that , when executing the line -

ftp.retrbinary('RETR %s' % filename, fhandle.write)

This is using the fhandle.write() function to write the data from the ftp server to the file (with name - target_file_name) , but by the time you are calling -shutil.copyfile - the buffer for fhandle has not completely flushed, so you are missing out on some data when copying the file.

To make sure that this does not occur, you can either move the copyfile logic out of the with block for fhandle .

Or you can call fhandle.flush() to flush the buffer , before copying the file .

I believe it would be better to close the file (move the logic out of the with block). Example -

for filename in ftp.nlst(filematch):
    target_file_name = os.path.basename(filename)
    with open(target_file_name ,'wb') as fhandle:
        ftp.retrbinary('RETR %s' % filename, fhandle.write)
    the_files.append(target_file_name)
    mtime = modification_date(target_file_name)
    mtime_str_for_file = str(mtime)[0:10] + str(mtime)[11:13] + str(mtime)[14:16]    + str(mtime)[17:19] + str(mtime)[20:28]#2014-12-11 15:08:00.338415.
    sorted_xml_files = [file for file in glob.glob(os.path.join('\\\\Storage\\shared\\', '*.xml'))]
    sorted_xml_files.sort(key=os.path.getmtime)
    last_file = sorted_xml_files[-1]
    file_is_the_same = filecmp.cmp(target_file_name, last_file)
    if not file_is_the_same:
        print 'File changed!'
        copyfile(target_file_name, '\\\\Storage\\shared\\'+'datebreaks'+mtime_str_for_file+'.xml') 
    else:
        print 'File '+ last_file +' hasn\'t changed, doin nothin'
        continue
like image 164
Anand S Kumar Avatar answered Sep 23 '22 06:09

Anand S Kumar