Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary lines in multipart/form-data (file upload)

I'm writing a simple webserver in python that allows a user to upload a file using multipart/form-data. As far as I can tell, multipart MIME data is supposed to be line based. For instance, the boundary has to be at the beginning of a line.

I can't figure out how binary data is handled in this regard. My client (Firefox) is not encoding it into 7bit ASCII or anything, it's just raw binary data it's sending. Does it split the data into lines at arbitrary locations? Is there a maximum line length specified for multipart data? I've tried looking through the RFC for multipart/form-data, but didn't find anything.

like image 831
brianmearns Avatar asked Mar 27 '13 16:03

brianmearns


1 Answers

After digging through the RFCs, I think I finally got it all straight in my head. The body parts (i.e., the body content of an individual part in a multipart/* message) only need to be line based in that the boundary at the end of the part begins with a CR+LF. But otherwise, the data need not be line-based, and if the content happens to have linebreaks in it, there is no maximum distance between them, nor do they need to be escaped in anyway (well, unless perhaps the Content-Transfer-Encoding is quoted-string). The 7-bit, 8-bit, and binary options for Content-Transfer-Encoding don't actually indicate that any encoding has been done on the data (and therefore no encoding needs to be undone), they're just meant to indicate the type of data you can expect to see in the body part.

What I was really getting at in my [poorly expressed] question was how to read/buffer the data from the socket so that I could make sure I caught the boundary, and without having to have an arbitrarily large buffer (e.g., if there happened to be no linebreaks in the content, and so a readline ended up buffering the entire thing).

What I ended up doing was buffering from the socket with a readline using a maximum length, so the buffer would never be longer than that, but would also make sure to terminate if a linebreak was encountered. This ensured that when the boundary came (following a CR+LF), it would be at the beginning of the buffer. I had to do a little extra monkeying around to ensure I didn't include that final CR+LF in the actual body content, because according to the RFC it's required before the boundary, and therefore not part of the content itself.

like image 154
brianmearns Avatar answered Nov 13 '22 02:11

brianmearns