Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse multipart/form-data using cgi.FieldStorage; None keys

The following piece of code should be able to run in Python 2.7 and Python 3.x.

from __future__ import unicode_literals
from __future__ import print_function

import cgi
try:
    from StringIO import StringIO as IO
except ImportError:
    from io import BytesIO as IO

body = """
--spam
Content-Disposition: form-data; name="param1"; filename=blob
Content-Type: binary/octet-stream

value1
--spam--
"""

parsed = cgi.FieldStorage(
    IO(body.encode('utf-8')),
    headers={'content-type': 'multipart/form-data; boundary=spam'},
    environ={'REQUEST_METHOD': 'POST'})

print([key for key in parsed])

In Python 2.7 it runs fine and it outputs ['param1']. In Python 3.4 however, it outputs [None].

I cannot get FieldStorage to get a usable result in Python 3. I suspect something internally changed and I'm now using it wrong. However I can't seem to figure out what. Any help is appreciated.

like image 335
siebz0r Avatar asked Oct 01 '15 14:10

siebz0r


2 Answers

These changes will make your script work identically in both Python 2.7.x and 3.4.x:

(I will use these abbreviations for cgi.FieldStorage(): Python 2.7.x: FS27, Python 3.4.x: FS34)

1 - While FS27 handles the newline before the boundary correctly, that is not the case with FS34 so the solution is to start with your boundary(spam) directly.

body = """--spam
Content-Disposition: form-data; name="param1"; filename=blob
Content-type: binary/octet-stream

value1
--spam--
"""

2 - Quoting from cgi.py source (in FS34's definition comments):

Arguments, all optional:

fp : file pointer; default: sys.stdin.buffer (not used when the request method is GET)

        Can be :
        1. a TextIOWrapper object
        2. an object whose read() and readline() methods return bytes

The grey part is not present in FS27 definition, so, most of the differences between FS27 and FS34 lie in the handling of strings(FS27) and binary streams(FS34).

In this context, FS34 can easily mess the semantics of the parsed object, unless it is given proper directions on how to handle this correctly. Apparently, the headers dictionary entry 'content-type': 'multipart/form-data; boundary=spam' is not enough; you have to supply the message length information.

You can achieve this, effectively, by adding a second entry in headers:

headers={'content-type': 'multipart/form-data; boundary=spam;',
'content-length': len(body)}

where the value for the content-length key is the body length (including the start/end boundaries).


These modifications, combined, lead to the desired result:

$ python script.py
['param1']
$ python3 script.py
['param1']

As proof-of-concept, these are the returned parsed objects from both FS27 and FS34:

...
print(parsed)
...

yields:

FieldStorage(None, None, [FieldStorage('param1', 'blob', 'value1')])

for FS27, and

FieldStorage(None, None, [FieldStorage('param1', 'blob', b'value1')])

for FS34.

like image 139
sokin Avatar answered Oct 29 '22 17:10

sokin


In both Python 2.7 and Python 3.5 (not working in Python 3.4 for some reason), the desired output is returned by adding Content-Length to the response body:

body = """
--spam
Content-Disposition: form-data; name="param1"; filename=blob
Content-Length: 6
Content-Type: binary/octet-stream

value1
--spam--
"""
like image 23
kvdb Avatar answered Oct 29 '22 15:10

kvdb