Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python FTP get the most recent file by date

I am using ftplib to connect to an ftp site. I want to get the most recently uploaded file and download it. I am able to connect to the ftp server and list the files, I also have put them in a list and got the datefield converted. Is there any function/module which can get the recent date and output the whole line from the list?

#!/usr/bin/env python

import ftplib
import os
import socket
import sys


HOST = 'test'


def main():
    try:
        f = ftplib.FTP(HOST)
    except (socket.error, socket.gaierror), e:
        print 'cannot reach to %s' % HOST
        return
    print "Connect to ftp server"

    try:
        f.login('anonymous','[email protected]')
    except ftplib.error_perm:
        print 'cannot login anonymously'
        f.quit()
        return
    print "logged on to the ftp server"

    data = []
    f.dir(data.append)
    for line in data:
        datestr = ' '.join(line.split()[0:2])
        orig-date = time.strptime(datestr, '%d-%m-%y %H:%M%p')


    f.quit()
    return


if __name__ == '__main__':
    main()

RESOLVED:

data = []
f.dir(data.append)
datelist = []
filelist = []
for line in data:
    col = line.split()
    datestr = ' '.join(line.split()[0:2])
    date = time.strptime(datestr, '%m-%d-%y %H:%M%p')
    datelist.append(date)
    filelist.append(col[3])

combo = zip(datelist,filelist)
who = dict(combo)

for key in sorted(who.iterkeys(), reverse=True):
   print "%s: %s" % (key,who[key])
   filename = who[key]
   print "file to download is %s" % filename
   try:
       f.retrbinary('RETR %s' % filename, open(filename, 'wb').write)
   except ftplib.err_perm:
       print "Error: cannot read file %s" % filename
       os.unlink(filename)
   else:
       print "***Downloaded*** %s " % filename
   return

f.quit()
return

One problem, is it possible to retrieve the first element from the dictionary? what I did here is that the for loop runs only once and exits thereby giving me the first sorted value which is fine, but I don't think it is a good practice to do it in this way..

like image 313
krisdigitx Avatar asked Jan 24 '12 16:01

krisdigitx


3 Answers

For those looking for a full solution for finding the latest file in a folder:

MLSD

If your FTP server supports MLSD command, a solution is easy:

entries = list(ftp.mlsd())
entries.sort(key = lambda entry: entry[1]['modify'], reverse = True)
latest_name = entries[0][0]
print(latest_name)

LIST

If you need to rely on an obsolete LIST command, you have to parse a proprietary listing it returns.

Common *nix listing is like:

-rw-r--r-- 1 user group           4467 Mar 27  2018 file1.zip
-rw-r--r-- 1 user group         124529 Jun 18 15:31 file2.zip

With a listing like this, this code will do:

from dateutil import parser

# ...

lines = []
ftp.dir("", lines.append)

latest_time = None
latest_name = None

for line in lines:
    tokens = line.split(maxsplit = 9)
    time_str = tokens[5] + " " + tokens[6] + " " + tokens[7]
    time = parser.parse(time_str)
    if (latest_time is None) or (time > latest_time):
        latest_name = tokens[8]
        latest_time = time

print(latest_name)

This is a rather fragile approach.


MDTM

A more reliable, but a way less efficient, is to use MDTM command to retrieve timestamps of individual files/folders:

names = ftp.nlst()

latest_time = None
latest_name = None

for name in names:
    time = ftp.voidcmd("MDTM " + name)
    if (latest_time is None) or (time > latest_time):
        latest_name = name
        latest_time = time

print(latest_name)

For an alternative version of the code, see the answer by @Paulo.


Non-standard -t switch

Some FTP servers support a proprietary non-standard -t switch for NLST (or LIST) command.

lines = ftp.nlst("-t")

latest_name = lines[-1]

See How to get files in FTP folder sorted by modification time.


Downloading found file

No matter what approach you use, once you have the latest_name, you download it as any other file:

with open(latest_name, 'wb') as f:
    ftp.retrbinary('RETR '+ latest_name, f.write)

See also

  • Get the latest FTP folder name in Python
  • How to get FTP file's modify time using Python ftplib
like image 192
Martin Prikryl Avatar answered Oct 20 '22 15:10

Martin Prikryl


Why don't you use next dir option?

ftp.dir('-t',data.append)

With this option the file listing is time ordered from newest to oldest. Then just retrieve the first file in the list to download it.

like image 24
Santi Oliveras Avatar answered Oct 20 '22 15:10

Santi Oliveras


With NLST, like shown in Martin Prikryl's response, you should use sorted method:

ftp = FTP(host="127.0.0.1", user="u",passwd="p")
ftp.cwd("/data")
file_name = sorted(ftp.nlst(), key=lambda x: ftp.voidcmd(f"MDTM {x}"))[-1]
like image 42
Paulo Avatar answered Oct 20 '22 16:10

Paulo