Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading file opened with Python Paramiko SFTPClient.open method is slow

I am trying to remote read a netcdf file.
I used Paramiko package to read my file, like this:

import paramiko
from netCDF4 import Dataset

client = paramiko.SSHClient()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(hostname=’hostname’, username=’usrname’, password=’mypassword’)

sftp_client = client.open_sftp()
ncfile = sftp_client.open('mynetCDFfile')
b_ncfile = ncfile.read()    # ****

nc = Dataset('test.nc', memory=b_ncfile)

But the run speed of ncfile.read() is VERY SLOW.

So my question is: Is there any alternative way to read a netcdf file remotely, or is there any approach to speed up paramiko.sftp_file.SFTPFile.read()?

like image 703
Chun-Ye Lu Avatar asked Oct 17 '19 13:10

Chun-Ye Lu


People also ask

What is Paramiko SSHClient ()?

SSH client & key policies class paramiko.client. SSHClient. A high-level representation of a session with an SSH server. This class wraps Transport , Channel , and SFTPClient to take care of most aspects of authenticating and opening channels.

Does Pysftp use Paramiko?

Since SFTP doesn't really have the concept of a current working directory, this is emulated by Paramiko. Once you use this method to set a working directory, all operations on this SFTPClient object will be relative to that path. You can pass in None to stop using a current working directory.

What is Paramiko module in Python?

Paramiko is a Python library that makes a connection with a remote device through SSh. Paramiko is using SSH2 as a replacement of SSL to make a secure connection between two devices. It also supports the SFTP client and server model.

Is Paramiko secure?

The python package paramiko was scanned for known vulnerabilities and missing license, and no issues were found. Thus the package was deemed as safe to use.


1 Answers

Calling SFTPFile.prefetch should increase the read speed:

ncfile = sftp_client.open('mynetCDFfile')
ncfile.prefetch()
b_ncfile = ncfile.read()

Another option is enabling read buffering, using bufsize parameter of SFTPClient.open:

ncfile = sftp_client.open('mynetCDFfile', bufsize=32768)
b_ncfile = ncfile.read()

(32768 is a value of SFTPFile.MAX_REQUEST_SIZE)

Similarly for writes/uploads:
Writing to a file on SFTP server opened using pysftp "open" method is slow.


Yet another option is to explicitly specify the amount of data to read (it makes BufferedFile.read take a more efficient code path):

ncfile = sftp_client.open('mynetCDFfile')
b_ncfile = ncfile.read(ncfile.stat().st_size)

If none of that works, you can download the whole file to memory instead:
Use pdfplumber and Paramiko to read a PDF file from an SFTP server


Obligatory warning: Do not use AutoAddPolicy this way – You are losing a protection against MITM attacks by doing so. For a correct solution, see Paramiko "Unknown Server".

like image 74
Martin Prikryl Avatar answered Oct 19 '22 21:10

Martin Prikryl