Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle chunked encoding in Python BaseHTTPRequestHandler?

I have the following simple web server, utilizing Python's http module:

import http.server
import hashlib


class RequestHandler(http.server.BaseHTTPRequestHandler):
    protocol_version = "HTTP/1.1"

    def do_PUT(self):
        md5 = hashlib.md5()

        remaining = int(self.headers['Content-Length'])
        while True:
            data = self.rfile.read(min(remaining, 16384))
            remaining -= len(data)
            if not data or not remaining:
                break
            md5.update(data)
        print(md5.hexdigest())

        self.send_response(204)
        self.send_header('Connection', 'keep-alive')
        self.end_headers()


server = http.server.HTTPServer(('', 8000), RequestHandler)
server.serve_forever()

When I upload a file with curl, this works fine:

curl -vT /tmp/test http://localhost:8000/test

Because the file size is known upfront, curl will send a Content-Length: 5 header, so I can know how much should I read from the socket.

But if the file size is unknown, or the client decides to use chunked Transfer-Encoding, this approach fails.

It can be simulated with the following command:

curl -vT /tmp/test -H "Transfer-Encoding: chunked" http://localhost:8000/test

If I read from the self.rfile past of the chunk, it will wait forever and hang the client, until it breaks the TCP connection, where self.rfile.read will return an empty data, then it breaks out of the loop.

What would be needed to extend the above example to support chunked Transfer-Encoding as well?

like image 829
user582175 Avatar asked Mar 27 '20 22:03

user582175


1 Answers

As you can see in the description of Transfer-Encoding, a chunked transmission will have this shape:

chunk1_length\r\n
chunk1 (binary data)
\r\n
chunk2_length\r\n
chunk2 (binary data)
\r\n
0\r\n
\r\n

you just have to read one line, get the next chunk's size, and consume both the binary chunk and the followup newline.

This example would be able to handle requests either with Content-Length or Transfer-Encoding: chunked headers.

from http.server import HTTPServer, SimpleHTTPRequestHandler

PORT = 8080

class TestHTTPRequestHandler(SimpleHTTPRequestHandler):
    def do_PUT(self):
        self.send_response(200)
        self.end_headers()

        path = self.translate_path(self.path)

        if "Content-Length" in self.headers:
            content_length = int(self.headers["Content-Length"])
            body = self.rfile.read(content_length)
            with open(path, "wb") as out_file:
                out_file.write(body)
        elif "chunked" in self.headers.get("Transfer-Encoding", ""):
            with open(path, "wb") as out_file:
                while True:
                    line = self.rfile.readline().strip()
                    chunk_length = int(line, 16)

                    if chunk_length != 0:
                        chunk = self.rfile.read(chunk_length)
                        out_file.write(chunk)

                    # Each chunk is followed by an additional empty newline
                    # that we have to consume.
                    self.rfile.readline()

                    # Finally, a chunk size of 0 is an end indication
                    if chunk_length == 0:
                        break

httpd = HTTPServer(("", PORT), TestHTTPRequestHandler)

print("Serving at port:", httpd.server_port)
httpd.serve_forever()

Note I chose to inherit from SimpleHTTPRequestHandler instead of BaseHTTPRequestHandler, because then the method SimpleHTTPRequestHandler.translate_path() can be used to allow clients choosing the destination path (which can be useful or not, depending on the use case; my example was already written to use it).

You can test both operation modes with curl commands, as you mentioned:

# PUT with "Content-Length":
curl --upload-file "file.txt" \
  "http://127.0.0.1:8080/uploaded.txt"

# PUT with "Transfer-Encoding: chunked":
curl --upload-file "file.txt" -H "Transfer-Encoding: chunked" \
  "http://127.0.0.1:8080/uploaded.txt"
like image 58
j1elo Avatar answered Oct 11 '22 16:10

j1elo