Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python requests issues with non ascii file names

I'm using python requests to post a request. when the attachment parameter has some non ascii characters an exception is raised, in other occasions where only ascii data exists, everything is fine.

you can see the exception here

response = requests.post(url="https://api.mailgun.net/v2/%s/messages" % utils.config.mailDomain,
                auth=("api", utils.config.mailApiKey),
                data={
                        "from" : me,
                        "to" : recepients,
                        "subject" : subject,
                        "html" if html else "text" : message
                    },

                files= [('attachment', codecs.open(f.decode('utf8'))) for f in attachments] if attachments and len(attachments) else []                                
                )

EDITS: After decoding the file name with utf8, I don't get an exception however the file is not attached. I debugged requests with attaching a file with only ascii characters in its name, and the request headers requests build is:

{'Content-Type': None, 'Content-Location': None, 'Content-Disposition': u'form-data; name="attachment"; filename="Hello.docx"'}

This succeeds, I'm getting the mail with the attachments.

However, when using a file with Hebrew characters, the request's header is:

{'Content-Type': None, 'Content-Location': None, 'Content-Disposition': 'form-data; name="attachment"; filename*=utf-8\'\'%D7%91%D7%93%D7%99%D7%A7%D7%94.doc'}

I get the mail but without the file attached to it. Any ideas?

like image 370
omer bach Avatar asked Oct 21 '22 06:10

omer bach


1 Answers

When the filename contains non-ascii, requests library encodes it following standard RFC 2231. The format is as what you saw: filename*=utf-8''....... Seems the MailGun doesn't support this standard, as a result, non-ascii filenames got lost. You may contact MailGun to confirm what format they expect for unicode filenames.

As a not-perfect workaround, you can replace non-ascii chars as:

def replace_non_ascii(x): return ''.join(i if ord(i) < 128 else '_' for i in x) 

And explicitly specify filename when calling requests as (assume attachments is list of unicode-based filenames):

files= [('attachment', (replace_non_ascii(f), codecs.open(f))) for f in attachments] ...

EDITS

If you want to customize the header format, let's assume (instead of standard RFC 2231) MailGun can accept this kind of format:

filename="%D7%91%D7%93%D7%99%D7%A7%D7%94.doc"

Then you can customize filenames as:

import urllib
def custom_filename(x): return urllib.quote(x.encode('utf8'))

files= [('attachment', (custom_filename(f), codecs.open(f))) for f in attachments] ...

Depending on MailGun's response, it may be possible that you need to tweak codes of requests or use low level libraries (urllib2) instead. Hopefully they can support RFC 2231

like image 181
ZZY Avatar answered Oct 23 '22 02:10

ZZY