Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flask - headers are not converted to unicode?

I'm developping a small web service in python using:

  • Flask (v. 0.8)
  • storm ORM (v. 0.19)
  • Apache with mod_wsgi

I have a custom HTTP header, Unison-UUID which I'm using at some point to retrieve information in my database.

here's the (slightly rewritten for simplicity) snippet that I'm having trouble with:

uuid = flask.request.headers['Unison-UUID']
store = storm.locals.Store(my_database)
user = store.get(models.User, uuid)

The class User is more or less like this:

class User(Storm):
    uuid = Unicode(primary=True)
    # Other columns....

The code above fails in the following way:

  File "/Users/lum/Documents/unison-recsys/www/api/unison/unison.py", line 27, in decorated
    user = g.store.get(models.User, uuid)
  File "/Users/lum/Documents/unison-recsys/venv/lib/python2.6/site-packages/storm/store.py", line 165, in get
    variable = column.variable_factory(value=variable)
  File "/Users/lum/Documents/unison-recsys/venv/lib/python2.6/site-packages/storm/variables.py", line 396, in parse_set
    % (type(value), value))
TypeError: Expected unicode, found <type 'str'>: '00000000-0000-0000-0000-000000000009'

I don't really understand why this is happening and what I can do about it. I thought Flask was 100% unicode.

A quick fix I found is to decode the header value, i.e uuid = uuid.decode('utf-8'). Is this really what needs to be done? This seems a bit hackish. Is there no way to get unicode directly, without having to "decode" it manually?

like image 464
lum Avatar asked Apr 12 '12 13:04

lum


1 Answers

At http://flask.pocoo.org/docs/api/#flask.request we read

The request object is an instance of a Request subclass and provides all of the attributes Werkzeug defines.

The word Request links to http://werkzeug.pocoo.org/docs/wrappers/#werkzeug.wrappers.Request where we read

The Request and Response classes subclass the BaseRequest and BaseResponse classes and implement all the mixins Werkzeug provides:

The word BaseRequest links to http://werkzeug.pocoo.org/docs/wrappers/#werkzeug.wrappers.BaseRequest where we read

headers
The headers from the WSGI environ as immutable EnvironHeaders.

The word EnvironHeaders links to http://werkzeug.pocoo.org/docs/datastructures/#werkzeug.datastructures.EnvironHeaders where we read

This provides the same interface as Headers and is constructed from a WSGI environment.

The word Headers is... no, it's not linked but it should has been linked to http://werkzeug.pocoo.org/docs/datastructures/#werkzeug.datastructures.Headers where we read

Headers is mostly compatible with the Python wsgiref.headers.Headers class

where the phrase wsgiref.headers.Headers links to http://docs.python.org/dev/library/wsgiref.html#wsgiref.headers.Headers where we read

Create a mapping-like object wrapping headers, which must be a list of header name/value tuples as described in PEP 3333.

The phrase PEP 3333 links to http://www.python.org/dev/peps/pep-3333/ where there's no explicit definition of what type headers should be but after searching for word headers for a while we find this statement

WSGI therefore defines two kinds of "string":

"Native" strings (which are always implemented using the type named str)
that are used for request/response headers and metadata
"Bytestrings" (which are implemented using the `bytes` type in Python 3,
and `str` elsewhere), that are used for the bodies of requests and
responses (e.g. POST/PUT input data and HTML page outputs).

That's why in Python 2 you get headers as str not unicode.

Now let's move to decoding.

Neither your .decode('utf-8') nor mensi's .decode('ascii') (nor blindly expecting any other encoding) is universally good because In theory, HTTP header field values can transport anything; the tricky part is to get all parties (sender, receiver, and intermediates) to agree on the encoding.. Having said that I think you should act according to Julian Reshke's advice

Thus, the safe way to do this is to stick to ASCII, and choose an encoding on top of that, such as the one defined in RFC 5987.

after checking that User Agents (browsers) you support have implemented it.

Title of RFC 5987 is Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters

like image 189
Piotr Dobrogost Avatar answered Sep 25 '22 00:09

Piotr Dobrogost