Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flask - handling unicode text with werkzeug?

So I am trying to have a browser download a file with a certain name, which is stored in a database. To prevent filename conflicts the file is saved on disk with a GUID, and when it comes time to actually download it, the filename from the database is supplied for the browser. The name is written in Japanese, and when I display it on the page it comes out fine, so it is stored OK in the database. When I try to actually have the browser download it under that name:

return send_from_directory(app.config['FILE_FOLDER'], name_on_disk, 
                           as_attachment=True, attachment_filename = filename)

Flask throws an error:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 15-20: 
ordinal not in range(128)

The error seems to originate not from my code, but from part of Werkzeug:

/werkzeug/http.py", line 150, in quote_header_value
value = str(value)

Why is this happening? According to their docs, Flask is "100% Unicode"

I actually had this problem before I rewrote my code, and fixed it by modifying numerous things actually in Werkzeug, but I really do not want to have to do this for the deployed app because it is a pain and bad practice.

Python 2.7.6 (default, Nov 26 2013, 12:52:49) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

filename = "[얼티메이트] [131225] TVアニメ「キルラキル」オリジナルサウンドトラック (FLAC).zip"

print repr(filename)

'[\xec\x96\xbc\xed\x8b\xb0\xeb\xa9\x94\xec\x9d\xb4\xed\x8a\xb8] [131225] TV\xe3\x82\xa2\xe3\x83\x8b\xe3\x83\xa1\xe3\x80\x8c\xe3\x82\xad\xe3\x83\xab\xe3\x83\xa9\xe3\x82\xad\xe3\x83\xab\xe3\x80\x8d\xe3\x82\xaa\xe3\x83\xaa\xe3\x82\xb8\xe3\x83\x8a\xe3\x83\xab\xe3\x82\xb5\xe3\x82\xa6\xe3\x83\xb3\xe3\x83\x89\xe3\x83\x88\xe3\x83\xa9\xe3\x83\x83\xe3\x82\xaf (FLAC).zip'
>>> 
like image 777
Lucifer N. Avatar asked Feb 17 '14 00:02

Lucifer N.


2 Answers

You should explictly pass unicode strings (type unicode) when dealing with non-ASCII data. Generally in Flask, bytestrings are assumed to have an ascii encoding.

like image 117
Markus Unterwaditzer Avatar answered Oct 29 '22 04:10

Markus Unterwaditzer


I had a similar problem. I originally had this to send the file as attachment:

return send_file(dl_fd,
                 mimetype='application/pdf',
                 as_attachment=True,
                 attachment_filename=filename)

where dl_fd is a file descriptor for my file.

The unicode filename didn't work because the HTTP header doesn't support it. Instead, based on information from this Flask issue and these test cases for RFC 2231, I rewrote the above to encode the filename:

response = make_response(send_file(dl_fd,
                                   mimetype='application/pdf'
                                   ))
response.headers["Content-Disposition"] = \
    "attachment; " \
    "filename*=UTF-8''{quoted_filename}".format(
        quoted_filename=urllib.quote(filename.encode('utf8'))
    )

return response

Based on the test cases, the above doesn't work with IE8 but works with the other browsers listed. (I personally tested Firefox, Safari and Chrome on Mac)

like image 34
Timothée Boucher Avatar answered Oct 29 '22 04:10

Timothée Boucher