I am trying to download a CSV file (in-memory) from SFTP using Paramiko and import it into a pandas dataframe.
transport = paramiko.Transport((server, 22))
transport.connect(username=username, password=password)
sftp = paramiko.SFTPClient.from_transport(transport)
with open(file_name, 'wb') as fl:
sftp.getfo(file_name, fl, callback=printTotals)
df = pd.read_csv(fl, sep=' ')
The code below fails, telling me:
OSError: File is not open for reading
I assume that I need some kind of buffer or file like object for fl
instead, since open needs a file. I am relatively new to all of this, so I would be happy it if someone could help.
Here, we first move to the targeted directory using the conn.cd() method. Then, we just pass the file name to the conn. get() method. This leads to the downloading of that file into the local directory of our client-side local machine.
What you need to do is create an ssh client, then execute an ls with piped grep to find your file. such as ls /srv/ftp | grep '^FTP_' to find files in the /srv/ftp directory and start with FTP . Then open an sftp connection and execute the get command to bring the files over. This code is untested but should work.
Using “Paramkio” Python library to connect to SFTP server Paramiko is a Python interface built around the SSHV2 protocol. By using Paramiko we can build client and server application as per the SSHV2 protocol requirements.
A simple solution that still allows you to use progress callback is:
Use BytesIO
file-like object to store a downloaded file to memory;
You have to seek file pointer back to file start after downloading it, before you start reading it.
with io.BytesIO() as fl:
sftp.getfo(file_name, fl, callback=printTotals)
fl.seek(0)
df = pd.read_csv(fl, sep=' ')
Though with this solution, you will end up having the file loaded to memory twice.
Better solution is to implement a custom file-like object. It will even allow you to download and parse the file at the same time.
class FileWithProgress:
def __init__(self, fl):
self.fl = fl
self.size = fl.stat().st_size
self.p = 0
def read(self, blocksize):
r = self.fl.read(blocksize)
self.p += len(r)
print(str(self.p) + " of " + str(self.size))
return r
And use it like:
with sftp.open(file_name, "rb") as fl:
fl.prefetch()
df = pd.read_csv(FileWithProgress(fl), sep=' ')
For the SFTPFile.prefetch
call, refer to:
Reading file opened with Python Paramiko SFTPClient.open method is slow.
If you do not need the progress monitoring, simple code like this will do:
with sftp.open(file_name, "rb") as fl:
fl.prefetch()
df = pd.read_csv(fl, sep=' ')
What I ended up doing was a simple version of that, unfortunately without a callback for the progress, I also needed rb
for reading:
with sftp.open(file_name, 'rb') as fl:
df = pd.read_csv(fl, sep=' ')
Anyway, Martin's answer is what I was looking for!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With