Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Paramiko getfo to download file from SFTP server to memory to process it

I am trying to download a CSV file (in-memory) from SFTP using Paramiko and import it into a pandas dataframe.

transport = paramiko.Transport((server, 22))
transport.connect(username=username, password=password)
sftp = paramiko.SFTPClient.from_transport(transport)

with open(file_name, 'wb') as fl:
    sftp.getfo(file_name, fl, callback=printTotals)
    df = pd.read_csv(fl, sep=' ')

The code below fails, telling me:

OSError: File is not open for reading

I assume that I need some kind of buffer or file like object for fl instead, since open needs a file. I am relatively new to all of this, so I would be happy it if someone could help.

like image 603
lukas_o Avatar asked May 25 '18 14:05

lukas_o


People also ask

How do I download from SFTP server using python?

Here, we first move to the targeted directory using the conn.cd() method. Then, we just pass the file name to the conn. get() method. This leads to the downloading of that file into the local directory of our client-side local machine.

How do I download from Paramiko?

What you need to do is create an ssh client, then execute an ls with piped grep to find your file. such as ls /srv/ftp | grep '^FTP_' to find files in the /srv/ftp directory and start with FTP . Then open an sftp connection and execute the get command to bring the files over. This code is untested but should work.

What is Paramiko SFTP?

Using “Paramkio” Python library to connect to SFTP server Paramiko is a Python interface built around the SSHV2 protocol. By using Paramiko we can build client and server application as per the SSHV2 protocol requirements.


2 Answers

A simple solution that still allows you to use progress callback is:

  • Use BytesIO file-like object to store a downloaded file to memory;

  • You have to seek file pointer back to file start after downloading it, before you start reading it.

    with io.BytesIO() as fl:
        sftp.getfo(file_name, fl, callback=printTotals)
        fl.seek(0)
        df = pd.read_csv(fl, sep=' ')
    

Though with this solution, you will end up having the file loaded to memory twice.


Better solution is to implement a custom file-like object. It will even allow you to download and parse the file at the same time.

class FileWithProgress:

    def __init__(self, fl):
        self.fl = fl
        self.size = fl.stat().st_size
        self.p = 0

    def read(self, blocksize):
        r = self.fl.read(blocksize)
        self.p += len(r)
        print(str(self.p) + " of " + str(self.size)) 
        return r

And use it like:

with sftp.open(file_name, "rb") as fl:
    fl.prefetch()
    df = pd.read_csv(FileWithProgress(fl), sep=' ') 

For the SFTPFile.prefetch call, refer to:
Reading file opened with Python Paramiko SFTPClient.open method is slow
.


If you do not need the progress monitoring, simple code like this will do:

with sftp.open(file_name, "rb") as fl:
    fl.prefetch()
    df = pd.read_csv(fl, sep=' ') 
like image 163
Martin Prikryl Avatar answered Oct 25 '22 09:10

Martin Prikryl


What I ended up doing was a simple version of that, unfortunately without a callback for the progress, I also needed rb for reading:

with sftp.open(file_name, 'rb') as fl:
        df = pd.read_csv(fl, sep=' ')

Anyway, Martin's answer is what I was looking for!

like image 37
lukas_o Avatar answered Oct 25 '22 10:10

lukas_o