I wrote a Python script to access, manage and filter my emails via IMAP (using Python's imaplib).
To get the list of attachment for an email (without first downloading the entire email), I fetched the bodystructure of the email using the UID of the email, i.e.:
imap4.uid('FETCH', emailUID, '(BODYSTRUCTURE)')
and retrieve the attachment names from there.
Normally, the "portion" containing the attachment name would look like:
("attachment" ("filename" "This is the first attachment.zip"))
But on a couple of occasions, I encountered something like:
("attachment" ("filename" {34}', 'This is the second attachment.docx'))
I read somewhere that sometimes, instead of representing strings wrapped in double quotes, IMAP would use curly brackets with string length followed by the actual string (without quotes).
e.g.
{16}This is a string
But the string above doesn't seem to strictly adhere to that (there's a single-quote, a comma, and a space after the closing curly bracket, and the string itself is wrapped in single-quotes).
When I downloaded the entire email, the header for the message part containing that attachment seemed normal:
Content-Type: application/docx
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="This is the second attachment.docx"
How can I interpret (erm... parse) that "abnormal" body structure, making sense of the extra single-quotes, comma, etc...
And is that "standard"?
What you're looking at is a mangled literal, perhaps damaged by cut and waste? A literal looks like
{5}
Hello
That is, the length, then a CRLF, then that many bytes (not characters):
{4}
🐮
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With