Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resume FTP download after timeout

I'm downloading files from a flaky FTP server that often times out during file transfer and I was wondering if there was a way to reconnect and resume the download. I'm using Python's ftplib. Here is the code that I am using:

#! /usr/bin/python

import ftplib
import os
import socket
import sys

#--------------------------------#
# Define parameters for ftp site #
#--------------------------------#
site           = 'a.really.unstable.server'
user           = 'anonymous'
password       = '[email protected]'
root_ftp_dir   = '/directory1/'
root_local_dir = '/directory2/'

#---------------------------------------------------------------
# Tuple of order numbers to download. Each web request generates 
# an order numbers
#---------------------------------------------------------------
order_num = ('1','2','3','4')

#----------------------------------------------------------------#
# Loop through each order. Connect to server on each loop. There #
# might be a time out for the connection therefore reconnect for #
# every new ordernumber                                          #
#----------------------------------------------------------------#
# First change local directory
os.chdir(root_local_dir)

# Begin loop through 
for order in order_num:
    
    print 'Begin Proccessing order number %s' %order
    
    # Connect to FTP site
    try:
        ftp = ftplib.FTP( host=site, timeout=1200 )
    except (socket.error, socket.gaierror), e:
        print 'ERROR: Unable to reach "%s"' %site
        sys.exit()
    
    # Login
    try:
        ftp.login(user,password)
    except ftplib.error_perm:
        print 'ERROR: Unable to login'
        ftp.quit()
        sys.exit()
     
    # Change remote directory to location of order
    try:
        ftp.cwd(root_ftp_dir+order)
    except ftplib.error_perm:
        print 'Unable to CD to "%s"' %(root_ftp_dir+order)
        sys.exit()

    # Get a list of files
    try:
        filelist = ftp.nlst()
    except ftplib.error_perm:
        print 'Unable to get file list from "%s"' %order
        sys.exit()
    
    #---------------------------------#
    # Loop through files and download #
    #---------------------------------#
    for each_file in filelist:
        
        file_local = open(each_file,'wb')
        
        try:
            ftp.retrbinary('RETR %s' %each_file, file_local.write)
            file_local.close()
        except ftplib.error_perm:
            print 'ERROR: cannot read file "%s"' %each_file
            os.unlink(each_file)
        
    ftp.quit()
    
    print 'Finished Proccessing order number %s' %order
    
sys.exit()

The error that I get:

socket.error: [Errno 110] Connection timed out

Any help is greatly appreciated.

like image 746
user8675309 Avatar asked Aug 03 '11 19:08

user8675309


2 Answers

Resuming a download through FTP using only standard facilities (see RFC959) requires use of the block transmission mode (section 3.4.2), which can be set using the MODE B command. Although this feature is technically required for conformance to the specification, I'm not sure all FTP server software implements it.

In the block transmission mode, as opposed to the stream transmission mode, the server sends the file in chunks, each of which has a marker. This marker may be re-submitted to the server to restart a failed transfer (section 3.5).

The specification says:

[...] a restart procedure is provided to protect users from gross system failures (including failures of a host, an FTP-process, or the underlying network).

However, AFAIK, the specification does not define a required lifetime for markers. It only says the following:

The marker information has meaning only to the sender, but must consist of printable characters in the default or negotiated language of the control connection (ASCII or EBCDIC). The marker could represent a bit-count, a record-count, or any other information by which a system may identify a data checkpoint. The receiver of data, if it implements the restart procedure, would then mark the corresponding position of this marker in the receiving system, and return this information to the user.

It should be safe to assume that servers implementing this feature will provide markers that are valid between FTP sessions, but your mileage may vary.

like image 186
André Caron Avatar answered Nov 15 '22 09:11

André Caron


A simple example for implementing a resumable FTP download using Python ftplib:

def connect():

ftp = None

with open('bigfile', 'wb') as f:
    while (not finished):
        if ftp is None:
            print("Connecting...")
            FTP(host, user, passwd)

        try:
            rest = f.tell()
            if rest == 0:
                rest = None
                print("Starting new transfer...")
            else:
                print(f"Resuming transfer from {rest}...")
            ftp.retrbinary('RETR bigfile', f.write, rest=rest)
            print("Done")
            finished = True
        except Exception as e:
            ftp = None
            sec = 5
            print(f"Transfer failed: {e}, will retry in {sec} seconds...")
            time.sleep(sec)

More fine-grained exception handling is advisable.

Similarly for uploads:
Handling disconnects in Python ftplib FTP transfers file upload

like image 2
Martin Prikryl Avatar answered Nov 15 '22 11:11

Martin Prikryl