Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the block size for Python httplib's reads hard coded as 8192 bytes

I'm looking to make a fast streaming download -> upload to move large files via HTTP from one server to another.

During this, I've noticed that httplib, that is used by urllib3 and therefore also requests, seems to hard code how much it fetches from a stream at a time to 8192 bytes

https://github.com/python/cpython/blob/28453feaa8d88bbcbf6d834b1d5ca396d17265f2/Lib/http/client.py#L970

Why is this? What is the benefit of 8192 over other sizes?

like image 215
Michal Charemza Avatar asked Feb 10 '18 10:02

Michal Charemza


2 Answers

From what I found, the block size should be resources's page size but since pagesize is only available on UNIX, this was hardcoded to 8192 so all other systems specially Windows do not get blocked on this. Otherwise there is no other reason to hardcode it.

Source: https://bugs.python.org/issue21790

like image 138
kawadhiya21 Avatar answered Oct 22 '22 09:10

kawadhiya21


Nginx webserver

This is from nginx

Syntax: client_body_buffer_size size;

Default:    client_body_buffer_size 8k|16k;

Sets buffer size for reading client request body. In case the request body is larger than the buffer, the whole body or only its part is written to a temporary file. By default, buffer size is equal to two memory pages. This is 8K on x86, other 32-bit platforms, and x86-64. It is usually 16K on other 64-bit platforms

Apache WebServer

ProxyIOBufferSize Directive
Description:    Determine size of internal data throughput buffer
Syntax: ProxyIOBufferSize bytes
Default:    ProxyIOBufferSize 8192
Context:    server config, virtual host
Status: Extension
Module: mod_proxy

So Apache also uses 8192 by default as the proxy buffer size.

Apache Client

The apache Java client documentation indicates

https://hc.apache.org/httpcomponents-client-4.2.x/tutorial/html/connmgmt.html

  • CoreConnectionPNames.SOCKET_BUFFER_SIZE='http.socket.buffer-size': determines the size of the internal socket buffer used to buffer data while receiving / transmitting HTTP messages. This parameter expects a value of type java.lang.Integer. If this parameter is not set, HttpClient will allocate 8192 byte socket buffers.

Ruby Client

In ruby the value is set by default 16K

https://github.com/ruby/ruby/blob/814daf855e0aa2c3a1164dc765378d3a092a1825/lib/net/protocol.rb#L172

Then there are below thread

What is a good buffer size for socket programming?

What is the best memory buffer size to allocate to download a file from Internet?

Optimum file buffer read size?

If you look at many of this the consensus lies on 8K/16K as the buffer size. And it is not that it should be fixed to that but configurable and 8k/16K should be good enough for most situations. So I don't see a problem with Python also using that 8K by default. But yes it should have been configurable

Python 3.7 will have it configurable as such but then that may not help your cause if you can't upgrade to the same.

like image 28
Tarun Lalwani Avatar answered Oct 22 '22 08:10

Tarun Lalwani