Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing IMAP Email BODYSTRUCTURE for Attachment Names

Tags:

python

email

imap

I wrote a Python script to access, manage and filter my emails via IMAP (using Python's imaplib).

To get the list of attachment for an email (without first downloading the entire email), I fetched the bodystructure of the email using the UID of the email, i.e.:

imap4.uid('FETCH', emailUID, '(BODYSTRUCTURE)')

and retrieve the attachment names from there.

Normally, the "portion" containing the attachment name would look like:

("attachment" ("filename" "This is the first attachment.zip"))

But on a couple of occasions, I encountered something like:

("attachment" ("filename" {34}', 'This is the second attachment.docx'))

I read somewhere that sometimes, instead of representing strings wrapped in double quotes, IMAP would use curly brackets with string length followed by the actual string (without quotes).

e.g.

{16}This is a string

But the string above doesn't seem to strictly adhere to that (there's a single-quote, a comma, and a space after the closing curly bracket, and the string itself is wrapped in single-quotes).

When I downloaded the entire email, the header for the message part containing that attachment seemed normal:

Content-Type: application/docx
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="This is the second attachment.docx"

How can I interpret (erm... parse) that "abnormal" body structure, making sense of the extra single-quotes, comma, etc...

And is that "standard"?

like image 976
Edwin Lee Avatar asked Jul 10 '15 07:07

Edwin Lee


1 Answers

What you're looking at is a mangled literal, perhaps damaged by cut and waste? A literal looks like

{5}
Hello

That is, the length, then a CRLF, then that many bytes (not characters):

{4}
🐮
like image 178
arnt Avatar answered Oct 05 '22 02:10

arnt