Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python ftplib Optimal Block Size?

Tags:

python

ftp

I'm using python's ftplib to transfer lots and lots of data (~100 files X 2GB) across a local network to an FTP server. This code is running on Ubuntu. Here is my call (self is my FtpClient object, which is a wrapper around ftplib client):

# Store file.      
self.ftpClient.storbinary('STOR ' + destination, fileHandle, blocksize = self.blockSize, callback = self.__UpdateFileTransferProgress)

My question is, how do I choose an optimal block size? My understanding is that the optimal block size is dependent on a number of things, not the least of which are connection speed and latency. My code will be running on many different networks with different speeds and varying amounts of congestion throughout the day. Ideally, I would like to compute the optimal block size at run time.

Would the optimal FTP transfer block size be the same as the optimal TCP window size? If this is true, and TCP window scaling is turned on, is there a way to get the optimal TCP window size from the kernel? How/when does the linux kernel determine optimal window size? Ideally I could ask the linux kernel for the optimal block size, so as to avoid reinventing the wheel.

like image 774
user1777820 Avatar asked Jul 11 '14 16:07

user1777820


Video Answer


1 Answers

this is an interesting question and I had to dive in a bit deeper ;)

Anyway, here is a good example how to determine the MTU: http://erlerobotics.gitbooks.io/erle-robotics-python-gitbook-free/content/udp_and_tcp/udp_fragmentation.html

But, you should also think about the following: the MTU is something that is a local phenomena and maybe regards only a part of your local network. What you think about is the Path MTU, the minimal MTU over the complete transport path. http://en.wikipedia.org/wiki/Path_MTU_Discovery So, you'll have to know every MTU of every involved component. This can be a problem, for example if you're using Jumbo Frames and a switch not, the switch have to split the frames. I already had the problem that a switch did not understand jumbo frames and dropped the frames.

Now the most interesting question: the optimal blocksize. A lot of python functions take arguments like blocksize or chunksize. But they don't address the blocksize of the underlying transport protocol. The blocksize defines a reading buffer that will contain the data to be send/read. The standard size in ftplib is 8K (8192 bytes). So, adjusting the blocksize should not really affect the speed of the transfer.

Controlling the MTU of the underlying transport protocol is something that will be handled by the operation system and its kernel.

Finally some words about ftp. ftp is an old dinosaur which is easy to setup up and use but also is not always the best method to transfer files. Especially if you transfer a lot of small files. I don't know exactly your use case, so thinking about other transfer protocol alternatives like rsync or bbcp could make sense. The later seems to increase the copy speed drastically. You really should have a look at http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html

just my two cents...

like image 119
aronadaal Avatar answered Sep 24 '22 20:09

aronadaal