How to handle chunked encoding in Python BaseHTTPRequestHandler?

Question

I have the following simple web server, utilizing Python's http module:

import http.server
import hashlib


class RequestHandler(http.server.BaseHTTPRequestHandler):
    protocol_version = "HTTP/1.1"

    def do_PUT(self):
        md5 = hashlib.md5()

        remaining = int(self.headers['Content-Length'])
        while True:
            data = self.rfile.read(min(remaining, 16384))
            remaining -= len(data)
            if not data or not remaining:
                break
            md5.update(data)
        print(md5.hexdigest())

        self.send_response(204)
        self.send_header('Connection', 'keep-alive')
        self.end_headers()


server = http.server.HTTPServer(('', 8000), RequestHandler)
server.serve_forever()

When I upload a file with curl, this works fine:

curl -vT /tmp/test http://localhost:8000/test

Because the file size is known upfront, curl will send a Content-Length: 5 header, so I can know how much should I read from the socket.

But if the file size is unknown, or the client decides to use chunked Transfer-Encoding, this approach fails.

It can be simulated with the following command:

curl -vT /tmp/test -H "Transfer-Encoding: chunked" http://localhost:8000/test

If I read from the self.rfile past of the chunk, it will wait forever and hang the client, until it breaks the TCP connection, where self.rfile.read will return an empty data, then it breaks out of the loop.

What would be needed to extend the above example to support chunked Transfer-Encoding as well?

j1elo · Accepted Answer

As you can see in the description of Transfer-Encoding, a chunked transmission will have this shape:

chunk1_length

chunk1 (binary data)


chunk2_length

chunk2 (binary data)


0

you just have to read one line, get the next chunk's size, and consume both the binary chunk and the followup newline.

This example would be able to handle requests either with Content-Length or Transfer-Encoding: chunked headers.

from http.server import HTTPServer, SimpleHTTPRequestHandler

PORT = 8080

class TestHTTPRequestHandler(SimpleHTTPRequestHandler):
    def do_PUT(self):
        self.send_response(200)
        self.end_headers()

        path = self.translate_path(self.path)

        if "Content-Length" in self.headers:
            content_length = int(self.headers["Content-Length"])
            body = self.rfile.read(content_length)
            with open(path, "wb") as out_file:
                out_file.write(body)
        elif "chunked" in self.headers.get("Transfer-Encoding", ""):
            with open(path, "wb") as out_file:
                while True:
                    line = self.rfile.readline().strip()
                    chunk_length = int(line, 16)

                    if chunk_length != 0:
                        chunk = self.rfile.read(chunk_length)
                        out_file.write(chunk)

                    # Each chunk is followed by an additional empty newline
                    # that we have to consume.
                    self.rfile.readline()

                    # Finally, a chunk size of 0 is an end indication
                    if chunk_length == 0:
                        break

httpd = HTTPServer(("", PORT), TestHTTPRequestHandler)

print("Serving at port:", httpd.server_port)
httpd.serve_forever()

Note I chose to inherit from SimpleHTTPRequestHandler instead of BaseHTTPRequestHandler, because then the method SimpleHTTPRequestHandler.translate_path() can be used to allow clients choosing the destination path (which can be useful or not, depending on the use case; my example was already written to use it).

You can test both operation modes with curl commands, as you mentioned:

# PUT with "Content-Length":
curl --upload-file "file.txt" \
  "http://127.0.0.1:8080/uploaded.txt"

# PUT with "Transfer-Encoding: chunked":
curl --upload-file "file.txt" -H "Transfer-Encoding: chunked" \
  "http://127.0.0.1:8080/uploaded.txt"

How to handle chunked encoding in Python BaseHTTPRequestHandler?

Tags:

python

http

chunked

user582175

1 Answers

j1elo

Recent Activity

Donate For Us

How to handle chunked encoding in Python BaseHTTPRequestHandler?

Tags:

python

http

chunked

user582175

1 Answers

j1elo

Related questions

Recent Activity

Donate For Us