Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trouble Transferring data from FTP server to S3 via stream using Python

I am looking to transfer the contents of a folder from an ftp server to a bucket in s3 without writing to disk. Currently, s3 is getting all of the names of the files in the folder, but none of the actual data. Each file in the folder is only a few bytes. I'm not quite sure why it is not uploading the whole file.

from ftplib import FTP
import io 
import boto3


s3= boto3.resource('s3')

ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login()
ftp.cwd('pubchem/RDF/descriptor/compound')

address =  'ftp.ncbi.nlm.nih.gov/pubchem/RDF/descriptor/compound/'

filelist = ftp.nlst()

for x in range(0, len(filelist)-1):
    myfile = io.BytesIO()
    filename = 'RETR ' + filelist[x]
    resp = ftp.retrbinary(filename, myfile.write)
    myfile.seek(0)
    path = address + filelist[x]
    #putting file on s3
    s3.Object(s3bucketname, path).put(Body = resp)


ftp.quit()

Is there any way to make sure the whole file is uploaded?

like image 485
Satchmo Avatar asked Dec 15 '16 19:12

Satchmo


1 Answers

We can transfer the data from FTP server to S3 via stream using Python. The data won't download in /tmp location in AWS Lambda. It will directly stream the data from FTP to S3 bucket.

from ftplib import FTP
import s3fs

def lambda_handler(event, context):
    file_name = "test.txt" #file name in ftp
    s3 = s3fs.S3FileSystem(anon=False)
    ftp_path = "<ftp_path>"
    s3_path = "s3-dev" #S3 bucket name

with FTP("<ftp_server>") as ftp:
    ftp.login()
    ftp.cwd(ftp_path)
    ftp.retrbinary('RETR ' + file_name, s3.open("{}/{}".format(s3_path, file_name), 'wb').write)
like image 178
vinod_vh Avatar answered Oct 26 '22 06:10

vinod_vh