Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the latest files from FTP folder (filename having spaces) in Python

I have a requirement where I have to pull the latest files from an FTP folder, the problem is that the filename is having spaces and the filename is having a specific pattern. Below is the code I have implemented:

import sys
from ftplib import FTP
import os
import socket
import time
import pandas as pd
import numpy as np
from glob import glob
import datetime as dt
from __future__ import with_statement

ftp = FTP('')
ftp.login('','')
ftp.cwd('')
ftp.retrlines('LIST')

filematch='*Elig.xlsx'
downloaded = []

for filename in ftp.nlst(filematch):
  fhandle=open(filename, 'wb')
  print 'Getting ' + filename
  ftp.retrbinary('RETR '+ filename, fhandle.write)
  fhandle.close()
  downloaded.append(filename)

ftp.quit()

I understand that I can append an empty list to ftp.dir() command, but since the filename is having spaces, I am unable to split it in the right way and pick the latest file of the type that I have mentined above.

Any help would be great.

like image 984
Manas Jani Avatar asked Sep 20 '17 15:09

Manas Jani


2 Answers

You can get the file mtime by sending the MDTM command iff the FTP server supports it and sort the files on the FTP server accordingly.

def get_newest_files(ftp, limit=None):
    """Retrieves newest files from the FTP connection.

    :ftp: The FTP connection to use.
    :limit: Abort after yielding this amount of files.
    """

    files = []

    # Decorate files with mtime.
    for filename in ftp.nlst():
        response = ftp.sendcmd('MDTM {}'.format(filename))
        _, mtime = response.split()
        files.append((mtime, filename))

    # Sort files by mtime and break after limit is reached.
    for index, decorated_filename in enumerate(sorted(files, reverse=True)):
        if limit is not None and index >= limit:
            break

        _, filename = decorated_filename  # Undecorate
        yield filename


downloaded = []

# Retrieves the newest file from the FTP server.
for filename in get_newest_files(ftp, limit=1):
    print 'Getting ' + filename

    with open(filename, 'wb') as file:
        ftp.retrbinary('RETR '+ filename, file.write)

    downloaded.append(filename)
like image 99
Richard Neumann Avatar answered Nov 14 '22 08:11

Richard Neumann


The issue is that the FTP "LIST" command returns text for humans, which format depends on the FTP server implementation.

Using PyFilesystem (in place of the standard ftplib) and its API will provide a "list" API (search "walk") that provide Pythonic structures of the file and directories lists hosted in the FTP server.

http://pyfilesystem2.readthedocs.io/en/latest/index.html

like image 1
glenfant Avatar answered Nov 14 '22 09:11

glenfant