I'm looking to make a fast streaming download -> upload to move large files via HTTP from one server to another.
During this, I've noticed that httplib, that is used by urllib3 and therefore also requests, seems to hard code how much it fetches from a stream at a time to 8192 bytes
https://github.com/python/cpython/blob/28453feaa8d88bbcbf6d834b1d5ca396d17265f2/Lib/http/client.py#L970
Why is this? What is the benefit of 8192 over other sizes?
From what I found, the block size should be resources's page size but since pagesize is only available on UNIX, this was hardcoded to 8192 so all other systems specially Windows do not get blocked on this. Otherwise there is no other reason to hardcode it.
Source: https://bugs.python.org/issue21790
Nginx webserver
This is from nginx
Syntax: client_body_buffer_size size;
Default: client_body_buffer_size 8k|16k;
Sets buffer size for reading client request body. In case the request body is larger than the buffer, the whole body or only its part is written to a temporary file. By default, buffer size is equal to two memory pages. This is 8K on x86, other 32-bit platforms, and x86-64. It is usually 16K on other 64-bit platforms
Apache WebServer
ProxyIOBufferSize Directive
Description: Determine size of internal data throughput buffer
Syntax: ProxyIOBufferSize bytes
Default: ProxyIOBufferSize 8192
Context: server config, virtual host
Status: Extension
Module: mod_proxy
So Apache also uses 8192
by default as the proxy buffer size.
Apache Client
The apache Java client documentation indicates
https://hc.apache.org/httpcomponents-client-4.2.x/tutorial/html/connmgmt.html
8192
byte socket buffers.Ruby Client
In ruby the value is set by default 16K
https://github.com/ruby/ruby/blob/814daf855e0aa2c3a1164dc765378d3a092a1825/lib/net/protocol.rb#L172
Then there are below thread
What is a good buffer size for socket programming?
What is the best memory buffer size to allocate to download a file from Internet?
Optimum file buffer read size?
If you look at many of this the consensus lies on 8K/16K as the buffer size. And it is not that it should be fixed to that but configurable and 8k/16K should be good enough for most situations. So I don't see a problem with Python also using that 8K by default. But yes it should have been configurable
Python 3.7
will have it configurable as such but then that may not help your cause if you can't upgrade to the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With