I have the following simple web server, utilizing Python's http
module:
import http.server
import hashlib
class RequestHandler(http.server.BaseHTTPRequestHandler):
protocol_version = "HTTP/1.1"
def do_PUT(self):
md5 = hashlib.md5()
remaining = int(self.headers['Content-Length'])
while True:
data = self.rfile.read(min(remaining, 16384))
remaining -= len(data)
if not data or not remaining:
break
md5.update(data)
print(md5.hexdigest())
self.send_response(204)
self.send_header('Connection', 'keep-alive')
self.end_headers()
server = http.server.HTTPServer(('', 8000), RequestHandler)
server.serve_forever()
When I upload a file with curl, this works fine:
curl -vT /tmp/test http://localhost:8000/test
Because the file size is known upfront, curl will send a Content-Length: 5
header, so I can know how much should I read from the socket.
But if the file size is unknown, or the client decides to use chunked
Transfer-Encoding, this approach fails.
It can be simulated with the following command:
curl -vT /tmp/test -H "Transfer-Encoding: chunked" http://localhost:8000/test
If I read from the self.rfile
past of the chunk, it will wait forever and hang the client, until it breaks the TCP connection, where self.rfile.read
will return an empty data, then it breaks out of the loop.
What would be needed to extend the above example to support chunked
Transfer-Encoding as well?
As you can see in the description of Transfer-Encoding, a chunked transmission will have this shape:
chunk1_length\r\n
chunk1 (binary data)
\r\n
chunk2_length\r\n
chunk2 (binary data)
\r\n
0\r\n
\r\n
you just have to read one line, get the next chunk's size, and consume both the binary chunk and the followup newline.
This example would be able to handle requests either with Content-Length
or Transfer-Encoding: chunked
headers.
from http.server import HTTPServer, SimpleHTTPRequestHandler
PORT = 8080
class TestHTTPRequestHandler(SimpleHTTPRequestHandler):
def do_PUT(self):
self.send_response(200)
self.end_headers()
path = self.translate_path(self.path)
if "Content-Length" in self.headers:
content_length = int(self.headers["Content-Length"])
body = self.rfile.read(content_length)
with open(path, "wb") as out_file:
out_file.write(body)
elif "chunked" in self.headers.get("Transfer-Encoding", ""):
with open(path, "wb") as out_file:
while True:
line = self.rfile.readline().strip()
chunk_length = int(line, 16)
if chunk_length != 0:
chunk = self.rfile.read(chunk_length)
out_file.write(chunk)
# Each chunk is followed by an additional empty newline
# that we have to consume.
self.rfile.readline()
# Finally, a chunk size of 0 is an end indication
if chunk_length == 0:
break
httpd = HTTPServer(("", PORT), TestHTTPRequestHandler)
print("Serving at port:", httpd.server_port)
httpd.serve_forever()
Note I chose to inherit from SimpleHTTPRequestHandler instead of BaseHTTPRequestHandler, because then the method SimpleHTTPRequestHandler.translate_path()
can be used to allow clients choosing the destination path (which can be useful or not, depending on the use case; my example was already written to use it).
You can test both operation modes with curl commands, as you mentioned:
# PUT with "Content-Length":
curl --upload-file "file.txt" \
"http://127.0.0.1:8080/uploaded.txt"
# PUT with "Transfer-Encoding: chunked":
curl --upload-file "file.txt" -H "Transfer-Encoding: chunked" \
"http://127.0.0.1:8080/uploaded.txt"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With